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Preface 



Static analysis is a research area aimed at developing principles and tools for ver- 
ification and semantics-based manipulation of programs and high-performance 
implementation of programming languages. The series of Static Analysis Sym- 
posia is a forum for the presentation and discussion of advances in the area. 

This volume contains the papers presented at the Eighth International Static 
Analysis Symposium (SAS 2001), which was held July 16-18, 2001 at the Sor- 
bonne in Paris, France. Previous SAS symposia were held in Santa 
Barbara, CA, USA (LNCS 1824), Venice, Italy (LNCS 1694), Pisa, Italy (LNCS 
1503), Paris, France (LNCS 1302), Aachen, Germany (LNCS 1145), Glasgow, 
UK (LNCS 983), Namur, Belgium (LNCS 864), following the international work- 
shop WSA in Padova, Italy (LNCS 724), Bordeaux, France (Bigre Vol. 81-82) 
and JTASPEFL/WSA, Bordeaux, France (Bigre Vol. 74). 

The program committee meeting was held at the Ecole Normale Superieure 
in Paris on March 31, 2001, and 21 papers were selected from 62 submissions. 
In addition to the contributed papers, this volume includes invited papers by 
Rustan Leino and Martin Rinard. This volume also contains the abstracts of an 
invited talk by Fred Schneider and of the presentations by Bruno Blanchet, 
Andrew Gordon, Andrew Myers, and David Wagner at an invited session 
on security. 
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Abstract. The field of program analysis has focused primarily on se- 
quential programming languages. But multithreading is becoming in- 
creasingly important, both as a program structuring mechanism and to 
support efficient parallel computations. This paper surveys research in 
analysis for multithreaded programs, focusing on ways to improve the ef- 
ficiency of analyzing interactions between threads, to detect data races, 
and to ameliorate the impact of weak memory consistency models. We 
identify two distinct classes of multithreaded programs, activity manage- 
ment programs and parallel computing programs, and discuss how the 
structure of these kinds of programs leads to different solutions to these 
problems. Specifically, we conclude that augmented type systems are the 
most promising approach for activity management programs, while tar- 
geted program analyses are the most promising approach for parallel 
computing programs. 



1 Introduction 

Multithreading is a widely used structuring technique for modern software. Pro- 
grammers use multiple threads of control for a variety of reasons: to build respon- 
sive servers that interact with multiple clients, to run computations in parallel 
on a multiprocessor for performance, and as a structuring mechanism for imple- 
menting rich user interfaces. In general, threads are useful whenever the software 
needs to manage a set of tasks with varying interaction latencies, exploit multiple 
physical resources, or execute largely independent tasks in response to multiple 
external events. 

Developing analyses for multithreaded programs can be a challenging activ- 
ity. The primary complication is characterizing the effect of the interactions be- 
tween threads. The obvious approach of analyzing all interleavings of statements 
from parallel threads fails because of the resulting exponential analysis times. A 
central challenge is therefore developing efficient abstractions and analyses that 
capture the effect of each thread’s actions on other parallel threads. 

Researchers have identified several ways to use the results of analyzing mul- 
tithreaded programs. Multithreading enables several new kinds of programming 
errors; the potential severity of these errors and difficulty of exposing them via 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. l-CH 2001. 
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testing has inspired the development of analyses that detect these errors stat- 
ically. Most of the research in this area has focused on detecting data races 
(which occur when two threads access the same data without synchronization 
and one of the accesses is a write) and deadlocks (which occur when threads 
are permanently blocked waiting for resources). Researchers have also developed 
optimizations for multithreaded programs; some of these optimizations gener- 
alize existing optimizations for sequential programs while others are specific to 
multithreaded programs. 

After surveying research in the analysis and optimization of multithreaded 
programs, we discuss the issues associated with detecting data races in more 
depth. We first identify two distinct classes of multithreaded programs, activ- 
ity management programs, which use threads to manage a set of conceptually 
concurrent activities, and parallel computing programs, which use threads to ex- 
ecute computations in parallel for performance on a multiprocessor. For activity 
management programs, we conclude that the appropriate mechanism is an aug- 
mented type system that guarantees that the program is free of data races. 
Because such a type system would provide information about the potential in- 
teractions between parallel threads, it could also serve as a foundation for new, 
very precise analyses. For parallel computing programs, we conclude that the ap- 
propriate mechanism is a set of specialized analyses, each tailored for a specific 
concurrency and data usage pattern. 

The remainder of the paper is structured as follows. Section Q surveys uses 
of the analysis information while Section 0 discusses the analyses researchers 
have developed for multithreaded programs, focusing on ways to improve the 
efficiency of analyzing interactions between parallel threads. Sections 0 and 0 
discuss data race detection for activity management programs and parallel com- 
puting programs, respectively. Section 0 presents several issues associated with 
the use of weak memory consistency models. We conclude in Section 0 

2 Analysis Uses 

Researchers have proposed several uses for analysis information extracted from 
multithreaded programs. The first use is to enable optimizations, both gener- 
alizations of traditional compiler optimizations to multithreaded programs and 
optimizations that make sense only for multithreaded programs. The second use 
is to detect anomalies in the parallel execution such as data races or deadlock. 

2.1 Optimization Uses 

A problem with directly applying traditional compiler optimizations to mul- 
tithreaded programs is that the optimizations may reorder accesses to shared 
data in ways that may be observed by threads running concurrently with the 
transformed thread H2|. One approach is to generalize standard program rep- 
resentations, analyses, and transformations to safely optimize multithreaded 
programs even in the presence of accesses to shared data nzEZEm. 
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The presence of multithreading may also inspire optimizations with no obvi- 
ous counterpart in the optimization of sequential programs. Examples include 
communication optimizations |5hll n0| . optimizing mutual exclusion synchroniza- 
tion [3ni31l7hl3if8ll 111 3121 18^ . and optimizing barrier synchronization |96| . A 
more conservative approach is to ensure that the optimizations preserve the se- 
mantics of the original program by first identifying regions of the program that 
do not interact with other threads, then applying optimizations only within these 
regions. The analysis problem is determining which statements may interact with 
other threads and which may not. Escape analysis is an obvious analysis to use 
for this purpose — it recognizes data that is captured within the current thread 
and therefore inaccessible to other threads [1 1 |‘/nfl)8f1 ,3j8‘/’l |. The programming 
model may also separate shared and private data 



in some cases the 

analysis may automatically infer when pointers point to private data [tiJij . More 
elaborate analyses may recognize actions (such as acquiring a mutual exclusion 
lock or obtaining the only existing reference to an object) that temporarily give 
the thread exclusive access to specific objects potentially accessed by multiple 
threads. A final approach is to expect the programmer to correctly synchronize 
the program, then enable traditional compiler optimizations within any region 
that does not contain an action (for example, a synchronization action or thread 
creation action) that is designed to mediate interactions between threads |Y8j . 
This approach has the advantage that it eliminates the need to perform a po- 
tentially expensive interthread analysis as a prerequisite for applying traditional 
optimizations to multithreaded programs. The (serious) disadvantage is that 
optimization may change the result that the program computes. 



2.2 Data Race Detection 

In an unsafe language like C, there are a number of program actions that are 
almost always the result of programmer error, regardless of the context in which 
they occur. Examples include array bounds violations and accessing memory 
after it has been deallocated. If the program engages in these actions, it can 
produce behavior that is very difficult to understand. Several well-known lan- 
guage design and implementation techniques (garbage collection, array bounds 
checks) can completely eliminate these kinds of errors. The cost is additional ex- 
ecution overhead and a loss of programmer control over aspects of the program’s 
execution. The result was that, for many years, the dominant programming pro- 
gramming language (C) provided no protection at all against this class of errors. 

For programs that use threads, an analogous error is a data race, which occurs 
when multiple threads access the same data without an intervening synchroniza- 
tion operation, and one of the accesses is a write. A data race is almost always 
the result of a programming error, with a common outcome being the corruption 
of the accessed data structures. The fact that data races may show up only in- 
termittently due to different timings on different executions adds an extra layer 
of complexity not present for sequential programs. The most widely used mul- 
tithreaded languages, Java, C, and C-|— I- (augmented with a threads package), 
leave the programmer totally responsible for avoiding data races by correctly 
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synchronizing the computation. The result is that many systems builders view 
the use of threads as an inherently unsafe programming practice m- 

Presented with this problem, researchers have developed a set of analyses 
for determining if a program may have a data race. Some analyses allow the 
programmer to declare an association between data and locks, then check that 
the program holds the lock whenever it accesses the corresponding data innHi. 
Other analyses trace the control transfers associated with the use of synchroniza- 
tion constructs such as the post and wait constructs used in parallel dialects 
of Fortran l71llXldtill7l . the Ada rendezvous constructs or the 

Java wait and notify constructs [7373. The goal is to determine that the syn- 
chronization actions temporally separate conflicting accesses to shared data. In 
some cases it may be important to recognize that parallel tasks access disjoint 
regions of the same data structure. Researchers have developed many sophisti- 
cated techniques for extracting or verifying this kind of information. There are 
two broad categories: analyses that characterize the accessed regions of dense ma- 
trices IMIA‘1171, ^(177111, 4:iMI47l^4j . and analyses that extract or verify reachability 
properties of linked data structures Although many of these 

analyses were originally developed for the automatic parallelization of sequen- 
tial programs, the basic approaches should generalize to handle the appropriate 
kinds of multithreaded programs. Researchers have also developed dynamic race 
detection algorithms, which monitor a running program to detect races in that 
specific execution pzm, but provide no guarantees about other executions. 

Despite the sophistication of existing static techniques, the diversity and 
complexity of sharing patterns in multithreaded programs means that the static 
data race detection problem is still far from solved. In fact, as we discuss further 
in Section 0 ] we believe the ultimate solution for most programs will involve 
an augmented type system that eliminates the possibility of data races at the 
language level. 



2.3 Deadlock Detection 

Researchers have developed a variety of analyses for detecting potential dead- 
locks in Ada programs which use rendezvous synchronization [ssnDinnmEisi, 
mm- A rendezvous takes place between a call statement in one thread and 
an accept statement in another. The analyses match corresponding calls and 
accepts to determine if every call will eventually participate in a rendezvous. If 
not, the program is considered to deadlock. We note that deadlock tradition- 
ally arises from circular waiting to acquire resources, and is a classic problem 
in multithreaded programs. In this context, programs typically use mutual ex- 
clusion synchronization rather than rendezvous synchronization. We expect that 
a deadlock detection analysis for programs that use mutual exclusion synchro- 
nization would obtain a partial order on the acquired resources and check that 
the program always respects this order. The order could be obtained from the 
programmer or extracted automatically from an analysis of the program. 




Analysis of Multithreaded Programs 



5 



3 Analysis Algorithms 

We next discuss some of the issues that arise when applying standard approaches 
to analyze multithreaded programs. We focus on ways to improve the efficiency 
of analyzing interactions between different threads. 



3.1 Dataflow Analysis for Multithreaded Programs 



Dataflow analysis performs an abstract interpretation of the program to dis- 
cover program invariants at each program point jf)5lf)4l2(ij . Conceptually, one 
can view these analyses as propagating information along control-flow paths, an 
approach that works reasonably well for sequential programs in part because 
each statement typically has few direct control-flow successors. The straightfor- 
ward generalization of this approach to multithreaded programs would propagate 
information between statements of parallel threads P7I^ . The issue is that the 
direct control-flow successors of a statement in one thread typically include most 
if not all of the statements in all parallel threads. Propagating information along 
all of these potential control-flow edges leads to an algorithm with intractable 
execution times. The driving question is how to reduce the number of paths that 
the analysis must explicitly consider. 



Control-Flow Analysis. One approach is to analyze the program’s use of syn- 
chronization constructs to discover regions of tasks that may not execute concur- 
rently, then remove edges between these regions. The characteristics of the anal- 
ysis depend on the specific synchronization constructs. Researchers have devel- 
oped algorithms for programs that use the post and wait constructs used in par- 
allel dialects of Fortran 11811161171 ■ for the Ada rendezvous constructs mnn, 
m and for the Java wait and notify constructs jY,'-ilY4) . The basic idea behind 
these algorithms is to match each blocking action (such as a wait or accept) 
with its potential corresponding trigger actions (such as post or notify) from 
other threads. The analysis uses the information to determine that the state- 
ments before the trigger action must execute before the statements after the 
blocking action. 

In general, the algorithms for post and wait constructs are designed to work 
within parallel loops that access dense matrices. These programs use the post 
and wait constructs to ensure that a write to an array element in one parallel 
loop iteration precedes reads to that same element in other iterations. The tech- 
niques therefore focus on correlating the array accesses with the corresponding 
post and wait constructs that order them. The algorithms for the Ada ren- 
dezvous and Java wait and notify constructs tend to be most effective for 
programs in which the threads execute different code, enabling the analysis to 
distinguish between threads at the level of the code that each thread executes. 
We expect the algorithms to be less effective for server programs in which many 
threads execute the same code m- 
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Coarsening the Analysis Granularity. Another way to reduce the analysis 
time is to collect adjacent instructions from threads into larger groups, then 
treat each group as a unit in the interthread analysis [h7l45l2.SI75| . The typi- 
cal approach is to collect together instructions that do not interact with other 
threads; in this case the resulting coarsening of the analysis granularity does not 
affect the precision of the final analysis result. Because the relevant interactions 
usually take place at instructions from different threads that access the same 
data, the presence of references may significantly complicate the determination 
of which instructions may interact with other threads. One approach is to in- 
terleave a pointer analysis with the analysis that determines the instructions 
that may interact with other threads another approach would use the 

results of a previous efficient pointer analysis to find these instructions (candi- 
date analyses include flow-insensitive analyses [ttdpd] and analyses that do not 
analyze interleavings of instructions from different threads m)- 



Interference-Based Analyses. Interference-based analyses maximally coarsen 
the analysis granularity — they analyze each thread as a unit to compute a result 
that characterizes all potential interactions with other threads. The extracted 
analysis information then flows from the end of each thread to the beginning of 
all other parallel threads. For standard bitvector analyses such as live variables 
and reaching definitions, this approach somewhat surprisingly delivers an effi- 
cient algorithm with the same precision as an algorithm that explicitly analyzes 
all possible interleavings For more complicated analyses such as pointer 
analysis, existing algorithms based on this approach overestimate the effect of 
potential interactions between threads and lose precision [JS.'tlStij . Finally, if the 
language semantics rules out the possibility of interactions between tasks, ana- 
lyzing each task as a unit seems obviously the correct way to proceed m- 

3.2 Flow-Insensitive Analyses 

Unlike dataflow analyses, flow-insensitive analyses produce the same result re- 
gardless of the order in which the statements appear in the program or the num- 
ber of times that they are executed mm- They therefore trivially extend to 
handle multithreaded programs. The analysis results can be used directly or as a 
foundation to enhance the effectiveness of more detailed flow-sensitive analyses. 



3.3 Challenges 

The primary challenge for analyzing multithreaded programs remains develop- 
ing abstractions and analyses that precisely characterize interactions between 
threads. For explicit interactions that take place at synchronization constructs, 
the primary goal is to match interacting pairs of constructs. For implicit inter- 
actions that take place at memory locations accessed by multiple threads, the 
primary goal is to find instructions that access the same memory locations, then 
characterize the combined effect of the instructions. The use of dynamic memory 
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allocation, object references, and arrays significantly complicates the analysis of 
these implicit interactions because they force the analysis to disambiguate the 
accesses to avoid analyzing interactions that never occur when the program 
runs. The problem is especially acute for programs that use references because 
interactions between instructions that access references may, in turn, affect the 
locations that other instructions access. One of the main challenges is therefore 
to develop efficient disambiguation analyses for multithreaded programs. We 
see several potential foundations for these analyses: an augmented type system 
(see Section E), efficient interference-based or flow-insensitive pointer analyses, 
or exploiting structured control constructs such as parallel loops to confine the 
concurrency to a small part of the program and enable the use of very precise, 
detailed analyses. 

Many existing analyses assume a very simple model of multithreaded exe- 
cution characterized by the absence of one or more of dynamic object creation, 
dynamic thread creation, references to objects (including thread objects), and 
procedure or method calls. Given the pervasive use of these constructs in many 
multithreaded programs, an important challenge is to develop algorithms that 
can successfully analyze programs that use these constructs. 

4 Data Race Freedom in Activity Management Programs 

Given the problems associated with data races and the current inability of au- 
tomated techniques to verify that a range of programs are free of data races, 
techniques that guarantee data race freedom are of interest. The primary issue 
that shapes the field is the reason for using multiple threads and the resulting 
data usage patterns of the program. In this section we focus on activity manage- 
ment programs, or programs that use threads to manage a set of conceptually 
parallel activities such as interacting with a remote client unig. Because of 
the loose connection between the computations of the threads, these programs 
typically use an unstructured form of concurrency in which each thread executes 
independently of its parent thread. These programs typically manipulate several 
different kinds of data with different synchronization requirements. To success- 
fully verify data race freedom for these programs, the implementation must take 
these differences into account and use algorithms tailored for the properties that 
are relevant for each kind of data. 

— Private Data: Data accessed by only a single thread. 

— Inherited Data: Data created or initialized by a parent thread, then passed 
as a parameter to a child thread. Once the child threads starts its execution, 
the parent thread no longer accesses the data. 

— Migrating Data: Data that is passed between parallel threads, often as 
part of producer/consumer relationships. Although multiple threads access 
migrating data, at each point in time there is a single thread that has con- 
ceptual ownership of the data and no other threads access the data until 
ownership changes. 
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— Published Data: Data that is initialized by a single thread, then distributed 
to multiple reader threads for read-only access. 

— Mutex Data: Data that is potentially accessed and updated by multiple 
parallel threads, with the updates kept consistent with mutual exclusion 
synchronization . 

— Reader /Writer Data: An extension of mutex data to support concurrent 
access by readers and exclusive access by writers. 

Program actions temporally separate accesses from different threads and en- 
sure data race freedom. For inherited data, the thread creation action separates 
the parent accesses from the child accesses. For mutex and reader/ writer data, 
the lock acquire and release actions separate accesses from different threads. For 
published data, the action that makes a reference to the data accessible to mul- 
tiple reader threads separates the writes of the initializing thread from the reads 
of the reader threads. For migrating data, the actions that transfer ownership of 
the data from one thread to the next separate the accesses. Mutex, published, 
and migrating data often work together to implement common communication 
patterns in multithreaded programs. For example, a shared queue usually con- 
tains mutex data (the queue header) and migrating data (the elements of the 
queue). 

Given the diversity of the different kinds of data and the complexity of their 
access patterns, we believe it will be extremely difficult for any analysis to au- 
tomatically reconstruct enough information to verify data race freedom in the 
full range of activity management programs. We therefore focus on language 
mechanisms that enable the programmer and the analysis to work together to 
establish that the program is free of data races. 

4.1 Augmented Type Systems for Race-Ftee Programs 

Many of the first researchers to write multithreaded programs were acutely aware 
of the possibility of data races, and developed languages that prevented the 
programmer from writing programs that contained them. The basic idea was to 
force each thread to acquire exclusive ownership of data before writing it, either 
by acquiring a lock on the data or by ensuring that the data is inaccessible 
to other threads. Concurrent Pascal, for example, carefully limits the use of 
references to ensure that the sharing between threads takes place only via data 
copied into and out of mutex data encapsulated in monitors US]. In effect, the 
language uses copy operations to convert migrating, inherited, and published 
data into private data. Because these copy operations take place in the context 
of a synchronized update to mutex data, they execute atomically with respect to 
the threads sharing the data. It is possible to generalize this approach to handle 
a wider range of data structures, including linked data structures containing 
references |S|. 

Another approach is to provide an augmented type system that enables the 
programmer to explicitly identify shared data accessible to multiple threads 
Each piece of shared data is associated with a mutual exclusion lock 
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and the type system enforces the constraint that the program holds the associ- 
ated lock whenever it accesses the corresponding shared data. The type system 
may also support a variety of other kinds of data that can be safely accessed 
without synchronization; examples include private data accessible to only a sin- 
gle thread, constant data that is never modified once it has been initialized, and 
value data that may be copied into and out of shared data. It is also possible 
to use a linear type system to ensure the existence of at most one reference to 
a given piece of data, with the data owned by the thread that holds its ref- 
erence m- In this scenario, the movements of inherited and migrating data 
between threads correspond to acquisitions and releases of the unique reference 
to the moving data. 

In spirit, these type systems extend the basic safe monitor approach devel- 
oped in the 1970s to work for modern languages with linked data structures. 
The key challenge is controlling the use of references to eliminate the possibil- 
ity of inadvertently making unsynchronized data reachable to multiple threads 
concurrently. Note that the most general solution to this problem would be to 
track all references to inherited, migrating, or published data and verify that 
threads did not use these references to incorrectly access the data. The difficulty 
of solving this general problem inspired the variety of other, more constrained, 
solutions described above. 

4.2 Future Directions in Augmented Type Systems 

The next step is to use some combination of language design and program anal- 
ysis to better understand the referencing behavior of the program and support a 
wider range of thread interaction patterns. We anticipate that the implementa- 
tion will focus on inherited, migrating, and published data. We view the situation 
for mutex and read/ write data as comparatively settled — current type systems 
or their relatively straightforward generalizations should be adequate for ensur- 
ing that mutex data is correctly synchronized. The implementation will therefore 
focus on extracting or verifying the following kinds of information: 

— Reachability: We anticipate that the implementation will use reachability 
information to verify the correct use of private, migrating, and inherited 
data. Specifically, it will verify that private data is reachable only from the 
thread that initially created the data and that when an ownership change 
takes place for inherited or migrating data, the data is inaccessible to the 
previous owner. 

— Write Checking: For published data, which is reachable to multiple threads, 
the implementation must verify that the data is never written once it be- 
comes accessible to multiple threads. There are two key components: identi- 
fying the transition from writable to read only, and verifying the absence of 
writes after the transition. 

For read/ write data, we anticipate that programmers will use locking con- 
structs that enable reads to execute concurrently but serialize writes with 
respect to all other accesses. The implementation must verify that all reads 
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are protected by a held read lock and all writes are protected by a held write 
lock. 



4.3 Impact on Other Analyses 

Because the augmented type information would enable the analysis to dramat- 
ically reduce the number of potential interthread interactions that it must con- 
sider, we expect it to enable researchers to develop quite precise and practical 
analyses that extract or verify detailed properties of the shared data. We antici- 
pate an approach that divides the program into atomic regions that access only 
shared or private data, then analyzes the program at the granularity of these 
regions. The analysis would analyze sequential interactions between regions from 
the same thread and some subset of the interleaved interactions between regions 
from different threads that access the same data, obtaining a result valid for all 
interleavings that might occur when the program runs. In effect, the analysis 
would view each region as an operation on shared or private data. Potentially 
extracted or verified properties include representation invariants for shared data, 
monotonicity properties of operations on shared data, and recognition of sets of 
commuting operations on shared data. 

4.4 Adoption Prospects 

For activity management programs, we anticipate that it will be both technically 
feasible and valuable to develop an expressive augmented type system that guar- 
antees data race freedom. The key question is whether such a type system would 
be accepted in practice. Factors that would influence its acceptance include how 
widespread multithreaded programming becomes, the ability of programmers to 
develop programs without data races in the absences of such a type system, the 
consequences of the data races programmers leave in the code, how well the ex- 
tended type system supports the full range of thread interaction patterns, and 
whether programmers perceive the extended information as a burden or a bene- 
fit. One potential approach might separate the extended type information from 
the rest of program, enabling programmers to use the standard type system for 
sequential programs and the extended system for multithreaded programs. An- 
other approach might provide standard defaults that work for most cases, with 
the programmer adjusting the defaults only when necessary. We note that over 
time, sequential languages have moved towards providing more safety guaran- 
tees, which argues for acceptance of increased safety in multithreaded languages. 

5 Data Race Freedom in Parallel Computing Programs 

Parallel computing programs use threads to subdivide a single computational 
task into multiple parallel subtasks for execution on a multiprocessor. Unlike 
activity management programs, parallel computing programs often execute a se- 
quence of steps, with the concurrency exploited within but not between steps. 
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The structure therefore closely corresponds to the structure one would use for 
a sequential program that performed the same computation. Because different 
steps may use the same piece of data in different ways, it is crucial for the im- 
plementation to identify the threads in different phases and treat each phase 
separately. The difficulty of identifying parallel phases depends on the specific 
concurrency generation constructs. If the program uses long-lived threads that 
persist across steps but periodically synchronize at a barrier, reconstructing the 
structure is a challenging analysis problem [5|. If the program uses structured 
control constructs such as parallel loops or recursively generates parallel com- 
putations in a divide and conquer fashion P2], the parallel phases are obvious 
from the syntactic structure of the program. 



Parallel computing programs use many of the same kinds of data as ac- 
tivity management programs. An additional complication is the fact that the 
parallel tasks often access disjoint parts of the same data structure. Over the 
years researchers have developed many sophisticated techniques for extracting 
or verifying this kind of information, both for programs that access dense ma- 
trices jSlrSI7l5(177RlRI,*I8l47ls4j and for programs that manipulate linked data 
structures hiSI Parallel computing programs may also use reductions 

and commuting operations, in which case it may be important to generalize al- 
gorithms from the field of automatic parallelization to verify that the program 
executes deterministically |8Ii44l48l8()| . In general, the programmer can reason- 
ably develop programs with quite sophisticated access patterns and data struc- 
tures, with the data race freedom of the program depending on the detailed 
properties of the data structures and the algorithms that manipulate them. It 
therefore seems unlikely that a general approach would be able to verify data 
race freedom for the full range of parallel computing programs. 



Because of the close correspondence between the parallel and sequential ver- 
sions of the program, it is often useful to view the threading constructs in parallel 
computing programs as annotations that express the programmer’s expectations 
about the lack of dependences between parts of the program rather than as con- 
structs that must generate parallel computation to preserve the semantics of the 
program. In this context, the analysis problem would be framed as a sequential 
program analysis that determines whether the identified parts of the program 
lack dependences. An advantage of this approach is that it eliminates the need 
to analyze interactions between parallel threads. 



In general, we view guaranteed data race freedom as both less feasible and 
potentially less important for parallel computing programs than for activity 
management programs. It is less feasible because it may depend on very de- 
tailed properties of arbitrarily sophisticated array access patterns or linked data 
structures. It is potentially less important because the parallelism tends to be 
confined within single parallel algorithms rather than operating across the entire 
execution of the program. While the algorithms in parallel computing programs 
may have very complicated internal structure, the fact that the potential inter- 
actions can be localized significantly increases the programmer’s ability to avoid 
inadvertent data races. Somewhat paradoxically, these properties raise the value 
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of automatic program analysis algorithms that can verify the data race freedom 
of parallel computing programs. There is room for a suite of targeted analyses, 
each of which is designed to analyze programs that access a certain kind of data 
in a certain way. The ability to confine the concurrency within a small part of 
the program makes it feasible to use very detailed, precise analyses. 

6 Weak Memory Consistency Models 

For a variety of performance reasons, many implementations of multithreaded 
languages have a weak memory consistency model that allows the implementa- 
tion to change the order in which writes from one thread are observed in parallel 
threads [T|ZE|. Moreover, standard weak consistency models enable executions in 
which different threads observe different orders for the same sequence of writes 
from a parallel thread. Weak consistency models are often considered to be coun- 
terintuitive because they break the abstraction of a single memory accessed by 
sequentially executing threads m- 

One might wonder how programmers are expected to successfully develop 
programs in languages with weak memory consistency models. Conceptually, 
weak consistency models do not reorder writes across synchronization operations. 
So the intention is that programmers will write properly synchronized, data-race 
free programs and never observe the reorderings. It is worth noting that weak 
consistency models are complex enough that researchers are still in the process of 
developing a rigorous semantics for them P3EE1. And the proposed semantics are 
significantly more complicated than the standard semantics for multithreaded 
programs, which simply interleave the statements from parallel threads. 



6.1 Short-Term Program Analysis Opportunities 

In the short term, weak memory consistency models will be a fact of life for 
developers of multithreaded software. Most modern processors implement weak 
consistency models in hardware, and Java specifies a weak consistency model 
for multithreaded programs, in part because if threads can access shared data 
without synchronization, many standard compiler optimizations may change the 
order in which threads perform (and other threads potentially observe) accesses 
to shared data m- In this context, the alternative to a weak consistency model 
is to disable these optimizations unless the compiler performs the global analysis 
required to determine that parallel threads do not observe the reordered mem- 
ory accesses |5Df64j . Requiring the extraction of this kind of global information 
as part of the standard compilation process is clearly problematic, primarily 
because it rules out optimized separate compilation. 

Another approach is to develop analyses and transformations that restore the 
abstraction of a single consistent shared memory with no reordered writes. The 
basic idea is to analyze the program, discover situations in which the threads 
may observe reordered writes, then augment the program with additional in- 
structions that prevent the hardware from reordering these writes [9116 Jj . This 
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research holds out the promise of providing the efficiency of a weak memory 
consistency model in the implementation combined with the abstraction of a 
single shared memory for the programmer. Because programs do not observe 
the effect of a weak consistency model unless they access shared data without 
explicit synchronization, we see these techniques as appropriate primarily for 
low-level programs that synthesize their own custom synchronization operations 
out of shared memory. 



6.2 Impact on Existing Analysis Algorithms 



Almost all existing analyses for multithreaded programs assume an interleav- 
ing model of concurrency. But weak consistency models generally increase the 
set of possible program behaviors as compared with the standard interleaving 
model, raising the possibility that existing analyses are unsound in the pres- 
ence of weak consistency models. Furthermore, the complexity of the semantics 
for programs with weak consistency models increases the difficulty of developing 
provably sound analyses for these programs. We suspect that many existing anal- 
yses are sound for programs with weak consistency models [41^)31371*^/1^131 . but 
this soundness is clearly inadvertent, in some cases a consequence of imprecision 
in the analysis, and not necessarily obvious to prove formally. 

We expect the difficulty of dealing with weak memory consistency models 
to inspire multiphase approaches. The first phase will either verify the absence 
of data races or transform the program to ensure that it does not observe any 
of the possible reorderings. The subsequent phases will then assume the sim- 
pler interleaving model of concurrency. Another alternative would be to use an 
augmented type system that guarantees race- free programs (see Section • The 
analysis could use the type information to identify regions within which it could 
aggressively reorder accesses to optimize the program without changing the re- 
sult that the program computes. 



7 Conclusion 

Multithreaded programs are significantly more complicated to analyze than se- 
quential programs. Many analyses have focused on characterizing interactions 
between threads to detect safety problems such as data races and deadlock or to 
hide anomalies associated with weak memory consistency models. Future direc- 
tions include generalizing abstractions and analyses to better handle constructs 
such as dynamically allocated memory, dynamic thread creation, procedures and 
methods, and threads as first-class objects. We also anticipate the further de- 
velopment of augmented type systems for race-free programs, which will reduce 
the potential interthread interactions that the analysis must consider and enable 
the development and use of more detailed, precise analyses. 
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Abstract. Transformation rnles of imperative concurrent programs, ba- 
sed on congruence and refinement relations between statements, are pre- 
sented. They introduce and/or eliminate synchronous communication 
statements and parallelism in these programs. The development is made 
within a subset of SPL, a good representative of imperative notations 
for concurrent and reactive programs introduced by Manna and Pnueli. 
The paper shows that no finite set of transformation rules suffices to 
eliminate synchronous communication statements from programs invol- 
ving the concatenation and parallelism operators only. An infinite set 
is given to suit this purpose, which can be applied recursively. As an 
important complement for the applications, a collection of tactics, for 
the acceleration of broader transformations, is described. Tactics apply 
a sequence of rules to a program with a specific transformation objective. 
The transformation rules and the tactics could be used in formal design 
to derive new programs from verified ones, preserving their properties, 
and avoiding the repetition of verifications for the transformed programs. 
As an example, the formal parallelization of a non-trivial distributed fast 
Fourier transform algorithm is outlined. 



1 Introduction 

A mathematical basis for the application of transformations to the design of pa- 
rallel and distributed programs is introduced. The specific scenario of equivalence 
preserving formal transformations, introducing and/or eliminating synchronous 
communications and parallelism, is treated. We work in two broad transfor- 
mations: formal parallelization and formal communication simplification. The 
former introduces internal synchronous communication and parallelism whereas 
the latter removes both. The basic constituent steps of formal parallelization and 
communication simplification are simple transformations, which have to corre- 
spond to congruences or other equivalence preserving relations between program 
statements. The soundness and meaning of them has to be mathematically es- 
tablished. Some sets of equivalence laws for concurrent programs are available in 
the literature, particularly in the area of process algebras. The books of Hoare 
and Milner 1 1 contain some such sets. In the area of static analysis, trans- 
formations have been reported in |2j and methods to derive parallel codes have 
been treated in HH. The works of Lamport and Hooman, reported in m and 
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m respectively, are related to the topic. However, these approaches are differ- 
ent to the one reported in the present work. The general problem of program 
analysis for communication elimination and introduction, at the syntactic level, 
of imperative programs is not treated in the above works and, up to our present 
knowledge, has not been dealt with in the literature either. Motivated by prac- 
tical design, we wish to work at the program level, rather than at the process or 
specification levels. Therefore we are interested neither in process algebras nor 
in action based systems, but rather in program state based transformations. The 
transformations have to work for a concrete programming notation, which has 
to be chosen. It has to incorporate explicit parallelism and communication via 
channels. A good representative of this class is the simple programming language 
(SPL) used in the framework of Manna and Pnueli, presented in the books 
P321. SPL programs can be verified and/or model checked in the Stanford 
Temporal Prover (STeP) p5l4j . SPL is similar to CSP PE] and OCCAM ^7j. 
This in no way restricts the transformation approach to this specific framework. 
The transformations preserve properties and can be used in cooperation with 
other model checkers and program verifiers, such as for instance SPIN and 
SMV |2S|, or in a broader design scenario where none of these tools is used but 
where the transformations would just allow the derivation of new programs from 
existing ones, by reusing and transforming them. As an illustration, consider the 
following SPL program, having top-level parallel or cooperating processes Pa, 
P and Pb- The inner process P is a concatenation whose first substatement is 
a parallel composition of two synchronous communication substatements with 
channels a and p. They match with synchronous communication substatements 
in processes Pa and Pb respectively. 



local a, b, x,y,z : integer 

local a, 13 : channel of integer 



Pa :: [produce a; 



P :: 



[a^x 11/3 ^ y] ; 



Pb :: [produce 6; /3 6; 



z:=x + y, 

consume 



The elimination of synchronous communications, and parallelism, would give the 
following equivalent and simpler program. 



local a,b,z : integer 

[produce a; produce 6; z \= a + b\ consume z;] 

This formal transformation needs laws such as P\ skip « skip , P k, P, as- 
sociativity of parallelism, and a proper communication elimination law such as 



[H'||iLn;[u:=e||P'-];[T'|r] 






These relations do not hold in SPL, but they are needed in order to carry out for- 
mal parallelization and communication simplification. A fundamental set of rela- 
tions needed in both transformations is justified mathematically in this work. In 
addition, a collection of basic tactics, as used in STeP and other theorem provers. 
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and which may have originated in LCF , is defined as a necessary complement 
of the rules. Tactics apply a sequence of transformation rules to a program with 
a specific transformation objective. They include the verification of applicabil- 
ity conditions, and accelerate the transformation process. They can be applied 
either interactively or from within a program, guaranteeing that the transforma- 
tion always corresponds to the application of a sequence of rules. It is important 
to remark that notations such as CCS and OCCAM do not have such nice sets 
II 71271 . The main reason for this is the existence of either non-determinism or 
the hidden action. This problem is analyzed in the present work for SPL, which 
does not have the needed set of rules either. As a result of the analysis a re- 
striction is necessary to be imposed on SPL for endowing it with the desired 
rules. For convenience, we have also introduced a nil statement. The restriction 
endows the notation with an intuitive set of basic relations which, although they 
do not carry out directly communication elimination and/or introduction, they 
are needed for the application of the proper communication elimination and/or 
introduction transformations. Parallelism associativity, and skip and nil intro- 
duction/elimination relative to concatenation and parallelism are in the set. It 
is also shown that the set of communication elimination and introduction laws 
cannot be finite, and that in order to obtain a practical set of transformations, 
congruence relations are not sufficient in general. Refinement relations have to 
be taken into consideration. The notion of refinement that we work with is taken 
from ED]. In general, existing work in refinement i^iviiiiati^2j has not been ap- 
plied to programs with synchronous communications in the direction which we 
explore. This adds another motivation. Although a recursive algorithm exists for 
the application of the infinite set of reductions, the present work does not go 
into it. For illustration purposes a non-trivial distributed fast fourier transform 
(FFT) algorithm has been derived with the methods introduced in this work. 
The starting point is a sequential recursive algorithm. 

2 Background Notions and Notation 

2.1 Introduction 

The meaning of statements will be defined in terms of state transition systems. 
Then, the equivalence of statements will be based on the equivalence of their 
associated transition systems. Some basic notions are needed for that, and are 
introduced in this section without much elaboration. The reader is referred to 
p2;-if24j for further detail. A program denotes a fair transition system (FTS), 
which is the tuple {V, S,0,T , J ,C) . Its components are, respectively, the set 
of system variables, the set of states, the initial condition, the finite set of tran- 
sitions, and the sets of just and eompassionate transitions. The set V = P U {tt}, 
where tt is the control variable, and Y = {yi, . . . , ym} is the set of data variables. 
A transition r is a function r : A ^ 2^ mapping a state to a possibly empty 
set of successor states r(s), it is characterized by a transition relation PriV, y')-, 
which is a first order formula expressing the relation holding between a state s 
and its r-successors s' . A transition is just (weakly fair) if when it is continuously 
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enabled, then it is taken eventually. Consequently it will be taken an indefinite 
number of times as well. A transition is compassionate (strongly fair) if when 
enabled an indefinite number of times it is taken eventually. Consequently it will 
be taken an indefinite number of times as well. A run of a FTS is an infinite 
sequence of states a : sq, si, S 2 , ... satisfying initiation (sq |= O) and conse- 
cution (sj+i G r(sj)). A computation of a FTS is a run all of whose transitions 
satisfy their corresponding fairness requirements, expressed by the sets J and 
C. In order to define a practical notion of equivalence between transition sys- 
tems it is enough to consider observable parts. Then, a reduced behavior is 
obtained from a computation a by retaining an observable part, relative to a set 
of observed variables O, where n ^ O, and eliminating from it stuttering steps 
(equivalent to the idling transition). 'R-{S) is the set of all reduced behaviors of 
a transition system S. Two transition systems and S 2 are equivalent relative 
to a set O of observed variables, denoted by ~ S '2 , if 7^(<S'i) = TZ{S 2 )- A 
system Sc refines system Sa, written Sc Q Sa , if every reduced behavior of Sc 
is also a reduced behavior of Sa- 



2.2 The Notation 

It is essential to define the precise notation to which the relations to be given 
apply. Since there are slight variations in the SPL notation throughout the two 
framework books, a schematic presentation of the subset of the SPL notation 
which we start with is included. Basically, the general selection statement has 
been restricted to only boolean and guarded communication selection forms. 
Also, in an initial attempt to obtain the above mentioned intuitive set of auxiliary 
relations, we have introduced a nil statement. The following table gives the 
definition of some basic statements of the notation: 



Statement 


Transition Relations 


Fairness 


Labels 


1 : skip; 1 : 


pe : move{£,£) Apres{Y) 


J 




1 : nifif : 






£^l£ 


£ : u := e\ l ■- 


pe : move{£,£) A m' = e 
Apres(Y — |u|) 


J 




£ : a ^ e\£ : 
m : a => u;m : 


p<r.m> : move{{[£], [m]}, {[£], [m]}) 
Am' = e a pres(Y — |m|) 


C 




£ : request r-,£: 


pe : move{£, £)Ar>0Ar' = r — 1 
Apres(Y — |r|) 


C 




£ : release r\£ : 


pe : move{£,£) A r' = r + 1 
Apres(Y — |r|) 


J 




£ : a ^ e\£ : 
a asynchronous 


pi : move{£,£) A ot = a ■ e 
Apres(Y — |a|) 


J 




£: a ^ u\£ : 
a asynchronous 


pi : move{£, £) A\a\ > 0 A a = m' • a' 
Apres(Y — |a|) 


C 





Following again Manna and Pnueli, with each statement S we associate a fi- 
nite set of transitions, and a finite set of control locations. The table gives the 
transition relation for each transition. An equivalence relation defined on 
statement labels, puts together labels which denote the same control location. A 
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location is an equivalence class of the label relation . The location correspon- 
ding to label t will be denoted by [t] . Usually, £ and £ are the pre and post labels 
of a statement S. The special variable tt G V ranges over sets of locations. Its 
value on a state denotes all the locations in the program that are currently ac- 
tive. The predicates pres{U) and move{L, L) express preservation of the values 
of the variables in U , and movement of control from the set of control locations 
L to L. The skip, nil, and assignment statements correspond to the first three 
lines of the table. Notice the no transition is associated with the nil statement, 
which puts its pre and post labels in the same class of the label relation. With a 
pair of synchronous communication statements, send and receive, over channel 
a, a joint transition is associated. This transition is included in the compassion 
set C, as in reference m- Next, the table specifies two semaphore statements 
and the two asynchronous communication statements. 

A pair of synchronous communication statements is said to be matching if 
their position in a program is such that the above joint transition pce^rn> could 
be enabled. For instance a send and a receive statement over the same channel 
may match but two send statements never do. 

The semantics of some compound statements is given in the following table: 



Statement 


Transition Relations 


£: [£i : S'lUi ... -,1m-. S'^;£™];£: 




£: [ [£i : 5i;£i] ||...|| [Im ■ Sm\£m\ ];£: 


pf : move{{[£]}, {[£i], . . . , [im]}) 
Apres{Y) 

pf : mowe({[£i], . . . , [im]}, {[£l}) 
Apres(Y) 


£ : [ c(ai),ci; S'! or ... or c(om), c^; Sm ]; £ : 


Pi : move{£,ii) A d A a[ = at ■ e 
Apres(Y — {oi}) 

Pi : move(i,ii) Ad A [ai[ >0 
Adi = u' ■ a'iApres(Y — {oi}) 
p<i,n> : move({i,n}, {£i,h}) 

AdAu' = eApres(Y—}u}) 



The label relations associated with the concatenation statement are U £i-i-i 
for i = l..m — 1 , £ £i , and i ''^l The cooperation statement has 
an entry and an exit transition, and associated with it. They are in the 
justice set J . No label relation is associated with this statement. Statements Si 
and 5^, z j, of a cooperation statement are said to be parallel. Similarly for any 
pair of their substatements, one in each. The three transitions which are shown 
in the table for the communication selection statement are the possible types 
of transitions depending on whether the communication statement c{ai) is an 
asynchronous send or receive, or a synchronous communication statement. The 
fairness set of these transitions is the one indicated in the first table. The pre and 
post labels £i and £i of substatements Si are not shown explicitly in the table. The 
label relations associated with this statement are £~l£i~l £m ■ The 

boolean selection statement is the special case of the communication selection 
statement, where the communication statements c(ai) are nil, non-existent. The 
general selection statement is not allowed. 



Communication and Parallelism Introduction and Elimination 



25 



The meaning of any program will be the FTS system which can be associated 
with the program using the statement semantics detailed above. Two programs 
are equivalent if their corresponding FTSs are equivalent. We proceed similarly 
with the notion of program refinement. 

2.3 Relations between Statements 

Transformation rules will correspond to congruence and refinement relations be- 
tween statements. The meaning of these relations is defined next. Unless stated 
otherwise O = Yi n l2- A program context P[ _ ] is a program P one of 
whose statements corresponds to a hole to be filled-in by an arbitrary statement 
S. With some abuse of notation P[S] will denote a program context, where 
S denotes the arbitrary statement placed in the hole. In some design scena- 
rios where synchronous communications are involved we need a more flexible 
notion. Before defining it we give some elementary concepts. Given a pair of 
matching synchronous communication statements, we say that one matches the 
other. When a joint synchronous communication transition is taken in a com- 
putation we say that a synchronous communication event has taken place. Two 
synchronous communication events are ordered if they take place in the same 
order in any computation. For instance, when the four synchronous communica- 
tion statements giving rise to two communication events are parallel statements 
then the two communication events are not ordered. 

Definition 1 (Flexible Program Context). Let S be a statement having 
synchronous communication operations com{ai) , . . . , com{an) with channels 
ai . . . , an which are not local to S. These communication operations have to 
match synchronous communication statements parallel to S in any program con- 
text F[S']. We say that a program context F[S'] is a flexible program context with 
respect to the non-local communication statements of S when the statements par- 
allel to S in F[S'] are disjoint with those in S and their matching communication 
statements are placed in the program in such a way that no order is imposed upon 
their corresponding communication events. 

Example: F[S] = S' || while T do c 6 fn{ai) || ... || while T do com{an) 
where com(ai) is the synchronous communication statement which matches 
com{ai), a substatement of S. The motivation for introducing this notion is 
the practical situation where we design a statement (program) by transforma- 
tions and after we have obtained its desired form we proceed to the design of 
the parallel statements which will communicate with it without deadlock. 

Two statements Si and S2 are defined to be congruent, written as Si « S2 , 
if P[Si] and P[S2] are equivalent ( that is, P[Si] ~ P[S2] ) for all program 
contexts P[S]. Then both statements are interchangeable in any program. We 
say that statement Sc refines statement Sa, written as Sc Q Sa , if for all 
program contexts P[S], P [Sc] refines P[Sa] ( that is, P[Sc] E P[Sa] ). We say that 
statement Sc is a flexible refinement of statement Sa, written as Sc Ef Sa , if for 
all flexible program contexts P[S], with respect to the external communication 
operations of Sa and Sc, F[Sc] E P[Saj. 
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3 Rules and Tactics for the Reduced Notation 

3.1 Simple Cases 

The first set of rules is based in the following congruences. Let pm{k), where 
k = l..m, denote the k-th integer of a permutation of the list (1,2, 

Then: nil; S ^ S S; nil ^ S 5i|| . . . « 5p„(i)|| . . . 

and Si;...;Sk;...;Si;...;Sm « Si; . . ,;[Sk; ■ ■ ■; Si]; . . Sm hold. The 
justification of these congruences is simple, since from the semantic definition 
of the notation, given in section 12.21 it can be seen that both sides of their 
congruence symbols have the same associated FTS. The tactics based on the 
above laws are simple. Here are some of them: 

Tactic 1 (Parallelism Permutation (Par Perm)). Performs a permutation 
of the substatements of a parallelism (cooperation) statement. 

Inputs.' A program, P. The label k of the cooperation substatement of P where 
the permutation has to be applied. A list of naturals In defining the desired per- 
mutation. 

Outputs.' A boolean, done, which takes the value true if the applicability condi- 
tions hold and the goal has been accomplished. The transformed program P' . 

Tactic 2 (Concatenation Association (ConcatAsso)). Associates, as a 
single substatement, a sequence of contiguous substatements of a concatenation 
statement. 

Inputs.' A program, P. The label k of the concatenation substatement of P where 
the association has to be done. Two naturals nl and n2, 0 < nl < n2, defining 
the first and the last substatement of the contiguous sequence. 

Outputs; A boolean, done, which takes the value true if the applicability condi- 
tions hold and the goal has been accomplished. The transformed program P' . 

The inverse would be a concatenation flattening tactic ConcatFlat. 

3.2 Rules of Skip 

The following remarks make the mathematical justification of the rules for the 
introduction/elimination of the skip statement hard. 

Remark 1 (Skip Concatenation Non-congruences). Let S and S be state- 
ments. Then S ^ S'; skip and S; S 9 ^ S;skip;S . Hence, in general 
S 76 skip; S 

As an intuitive clue to justify this remark, deleting an skip statement may enable 
transitions associated with the statement which immediately follows it, parti- 
cularly joint synchronous communication transitions formed with a statement 
parallel with it in some program context. This leads into an infinite number 
of enablings when the skip statement is within an infinite loop in the program 
context, and then some computation may be excluded from one side due to 
violation of the fairness requirements with respect to the enabled transition. 
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Remark 2 (Skip and Nil Cooperation Non-congruences). Let S be a 

statement. Then S' 9^ S' || skip and S 76 S || nil . 

The proof of these remarks is given in appendix El The following theorem is im- 
portant since it identifies, within the reduced notation, the fairness assumptions 
about communication statements as being responsible for the irregular beha- 
vior of the skip statement, as in the above non-congruences. We remark, as the 
appendix shows, that without strong fairness the congruences would hold. 



Theorem 1 (Concatenated Skip Deletion). Let S be a statement distinct 
from the nil statement. Let Sncs be a statement which is neither a communi- 
cation selection, nor a synchronous communication, nor a nil statement. Then 
S, Sncs ~ S, skip, Sjics • 



The proof is given in appendix IHl 

Theorem 2 (Parallel Skip and Nil Deletion). Let S be a statement dis- 
tinct from the nil statement. Let Sncs and S'ncs statements which are neither 
communication selection, nor synchronous communication, nor nil statements. 
Let S be an arbitrary statement. Then 

S; Sncs; S- S'ncs « [skip| | ,§]] ; 5;,, 

and S; Sncs', S; S'ncs ~ S; [nil||[5„,«; ^]]; S'ncs ■ 



The proof is similar to the one of the previous lemma. Here the entry and 
exit transitions of the cooperation statement, which are of the skip type, play 
the same role as the transition associated with the skip statement in the previous 
theorem. We remark that the congruences of the two previous theorems also hold 
without the restriction when no transition associated with the communication 
statements is in the compassion set C, as the proof in appendix [El shows. 

Lemma 1 (Associativity of Cooperation). Ln general 

[^i||...||5fc||...||5n|...||5™] « [^i||...|| [5,||...||5n ||...||5^] 

provided that the front statements of Sk, ■ ■ ■ , Si are neither synchronous commu- 
nication statements nor communication selection statements. Other congruences 
between cooperation statements with arbitrary associations of their substatements 
follow from the above congruence. 

The proof of this congruence is similar to the one for the skip deletion lemma 
above. It is omitted due to space restrictions. We note that the entry and exit 
transitions associated with the main cooperation statement are present in both 
sides. The entry and exit transitions of the inner cooperation statement are 
present in one side only, therefore in moving from one side to the other these 
inner skip-type transitions are deleted. The outer exit transition prevents the 
situation which was possible in theorem 0 above. However, this is not the case 
for the inner entry transition. 
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4 Communication Elimination and Introduction 

4.1 Restriction of SPL and New Tactics 

The above results reduce the set of congruences in the notation introduced so 
far, since they prevent the deletion of skip and nil statements in general. One 
would always have to check whether or not a synchronous communication or a 
communications selection statement follows next to the skip statement. Further- 
more, the congruences involving the cooperation statement which have nil as 
a substatement are needed in the process of synchronous communication elimi- 
nation and introduction. They allow to transform programs to forms where the 
elimination-introduction laws can apply. With the objective of obtaining the 
needed set of congruences, which overcomes these drawbacks, we introduce the 
delayed communications selection statement as the following abbreviation: 

[ com{ai) provided ci; Si dor . . . dor com{am) provided c^; Sm ] 

skip; [ com(ai) provided ci; S'! or ... or com{am) provided c^; Sm ] 

The restricted notation, to be used in the rest of this work, will not use 
communication selection statements but only delayed communication selection 
statements. Furthermore, an skip statement will also be inserted implicitly just 
before any isolated synchronous send or receive statement. These hidden skip 
statements will guarantee that the conditions of theorems P El and of lemma 
Qwill always hold. Therefore their congruences will also hold, and we will have 
associativity of cooperation and the intuitive congruences: 

skip; S' « S' , S;skip«S, skip||S « S , and nil ||S « S . 

The introduction of hidden skip statements may be done easily by the com- 
piler. Notice that what ultimately matters is the scheduling policy to be used 
with the programs. Therefore, an alternative way to justify the use of the above 
intuitive laws is to put all the transitions in the justice set including the ones 
associated with the communication statements. 

As a consequence of working in the restricted notation, lemmaQholds and the 
tactics for the flattening and association in cooperation statements, ParAsso and 
ParFlat can be defined in the same way as tactics ConcatAsso and ConcatFlat 
respectively. 

4.2 A Schema of Relations 

This subsection considers the case of only two parallel processes, each one of 
them containing a synchronous communication statement matching each other. 
We start with a simple intuitive congruence relation. 

Lemma 2 (Simple Communication Elimination and Introduction). Let 

R* and be statements which do not have communication statements through 
synchronous channel a, and and be statements. Then, 
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This proof is omitted since the result is a special case of the schema of 
relations of theorem 0 below. The congruence [a e\\a ^ u] « u := e is a 
special case obtained by making = T^' = = nil. 

Theorem 3 (Incompleteness of Any Finite Set of Laws). No finite set 
of laws, congruences or refinement relations, suffices to syntactically eliminate a 
pair of synchronous communication statements from restricted SPL statements. 

Proof Outline. Consider the statement e; [a u||P'’];T’'] 

which is a simple extension of the left hand side of the congruence above, and 
where P'’ is an arbitrary statement. The communication through synchronous 
channel a cannot be eliminated with the communications elimination lemma 
above. Restricting the statements to those constructible with concatenation and 
cooperation operators only, assume that we have found a relation G Q [ L 1 1 i? ] 
such that G has not the synchronous communication pair that communicates 
L and R, and that would eliminate the communication from the considered 
statement. Nevertheless, it would not be sufficient either since, consider the 
statement L || [ H ; [i?||P] ; T ] formed with the same L and R and where 
H does not contain synchronous communication statements through a. The 
assumed new law, which eliminated the communication from L\\R obtaining G, 
does not unify with the new statement due to the presence of P. Therefore, a 
new communication elimination law is needed. This reasoning can be iterated 
indefinitely. 

Theorem 4 (Communication Elimination and Introduction). In the fol- 
lowing all parallel processes are assumed to be disjoint, in the sense that they 
only read their shared data variables and they do not communicate through asyn- 
chronous channels. 

— Let and be statements. Let and Hf be statements which do not 
have communication operations through synchronous channel a. In both de- 
finitions k = 0,1, . . .. 

— Let Gq = [a e] and Gq = [a m] , and for k = 1,2, . . . 

Gi = Hi_,;[Gi_,\\Pl_,]-,Tl_, and Gl = Hf_,; [Gl_,\\Pf_,];Tf_, 

where statements Pj, and Pf can be expressed as Pj, = P^^ P™'", P^ o,nd 

Pf, = Pfc Pff'^', Pk" , and where any of the P processes may be nil, in such 
a way that the following holds: 

• None of the P statements contain communication statements through 
channel a. 

• Pj}^ communicates only with Hf and P^^ . 

• P^^ communicates only with Hi and PjfK 

• PJP^ communicates only with Gk, to be defined below, and PJP^ . 

• PJP^ communicates only with Gk, to be defined below, and PJP^ . 

• P^} communicates only with Tf and Pfl . 

• P^f communicates only with Tj, and P^} . 
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— Let Go = [m := e] , and for k = 1,2, . . . 

[[ptiX-iwmw.TL,]] 

Then Gn Qf [G^WG^] for n = l,2,... 

The proof is not given due to paper length restrictions. We note that for 
quite a number of cases the refinement relation of the above lemmas can be a 
congruence relation. By making the required statements equal to the nil state- 
ment, the above lemma can be applied to a great variety of special cases where 
the cooperation statement at the right of the refinement symbol may be Gl^\\G'^, 
where n ^ m. The following two tactics implement the recursive application of 
the above schema of laws. 

Tactic 3 (Communication Elimination (ComElim)). Given two matching 
synchronous communication statements within some cooperation statement of a 
program, transforms the program according to the lemma above, replacing the 
communication statements by their corresponding assignment. 

Inputs.' A program, P. The labels kl and k2 of the matching pair of communi- 
cation statements. 

Outputs.' A boolean, done, which takes the value true if the applicability condi- 
tions hold and the goal has been accomplished. The transformed program P’. 



Tactic 4 (Communication Introduction (Comintro)). Given an assign- 
ment statement within a program, transforms the program according to the lemma 
above, replacing the assignment statement by a pair of matching communication 
statements in parallel. If necessary, a new cooperation statement is introduced. 
Inputs.' A program, P. The label h of the assignment statement. The identifier 
newc of the channel. If the identifier is null, a suitable identifier is generated. 
Outputs.' A boolean, done, which takes the value true if the applicability condi- 
tions hold and the goal has been accomplished. The transformed program P’. 



5 Modular Procedures and Related Transformations 

Both, modules and procedures, are needed in the applications. SPL modules are 
defined in reference m, and they are extended for verification purposes in refer- 
ence PI- The notion of modular procedure combines the notions of module and 
of procedure. As used in the present work, it agrees with that of SPL module, but 
the possibility of referring to it at any point in a program is added. As modules, 
modular procedures can be composed in parallel. A reference statement, to a 
modular procedure, makes explicit all the names of the channels through which 
it interacts with processes parallel to the invocation statement. Also, it makes 
explicit the names of the variables shared with the rest of the program. The or- 
der of these names in the reference statement corresponds to their order within 
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the modular procedure interface definition. From now on in the text, modular 
procedures will be referred to simply as procedures. In the following example 
two modes of interface statements will be used. Mode out specifies an output 
variable or channel to be written only in the current procedure, no statement 
parallel to the reference statement can write into the variable or send via the 
channel. Mode external in specifies the possibility that a statement parallel to 
the reference writes into the variable or sends via the channel. For the purposes 
of program transformation, we define the semantics of the reference statement 
to be consistent with the operation of replacement of this statement by the mod- 
ular procedure body, possibly with a renaming of variables and channels, and 
with the reverse operation of encapsulation of a part of a program (a statement) 
within a procedure. Both operations are defined as transformation rules: 

Rule 1 (Replace Procedure Reference (RepRef)). Given a program P 
and the label k of a reference substatement within it, rule RepRef replaces the 
reference by the body of the proeedure having the name specified in the reference 
statement. The names of variables and channels of the interface occurring in the 
body are changed by their eorresponding names in the reference statement. 
Inputs.' A program, P which includes the declaration of the referred procedure. 
The label k of the reference statement. 

Outputs.' A boolean, done, which takes the value true if the applicability condi- 
tions hold and the goal has been accomplished. The transformed program P’. 



Rule 2 (Encapsulate -within a Reference (EncRef)). Given a program P 
and the label k of one of its statements, rule EncRef replaces the statement by a 
reference statement to an already defined procedure. The statement has to match 
with the procedure body, up to a relabeling of the names of variables and chan- 
nels appearing in the procedure interface definition. This is checked as part of 
the applicability condition of the rule. 

Inputs.- A program, P. The label k of the statement. The name name of the 
proeedure whose body has to replace the statement. 

Outputs; A boolean, done, which takes the value true if the applicability condi- 
tions hold and the goal has been aceomplished. The transformed program P’. 



6 Parallelization of an FFT 

The architecture to be designed computes the discrete fourier transform (DFT) 
PT|I of a vector / of order N. The DFT is defined as: 

JV-l 

PN[n] = — ^ f[m]wN^"' ’ n = 0, 1 (1) 

m— 0 

where and TV is a power of 2, = 2^" ; p = 0,1, .... The computa- 

tion is based on the Fast Fourier Transform (FFT) algorithm in its Decimation 



32 



Miquel Bertran et al. 



in Frequency (DF) variant Its basic step is the computation of the two 

half order vectors /o and /i in terms of the original / by the equations: 

fo H = ^ ( Vo [m] + [m] ) (2) 

/iH = ^(^oH - vi[m\) X (3) 

for m = 0, 1, N/2 — 1. Vectors vo and v\ are of order N /2, and are defined 
as vo[m] = f[m] , and vi[m] = f[m + N/2] for m = 0, 1, • • • , A^/2 — 1 . 
They correspond to the lower and upper halves of vector /. The computation 
associated with the two formulas for a fixed m is known as DF-hutterfly. Thus, 
the basic step computes N/2 butterflies. Then, the even and odd indexed com- 
ponents of Fn are obtained as the DFT of fo and /i, respectively. The same 
procedure could be repeated to compute these DFTs of order N/2. The DF-FFT 
algorithm applies the basic step recursively until the order TV = 1 is reached. 
Then, by equation O Ti = /. It corresponds to the following procedure: 

fout::=FFT{fin,w,p) :: 



out 

external in 
external in 
external in 
local 



font 

fin 

w 

P 



fO, fl, f onto, font! 
when p > 0 do 

(fO, /I) := BStep {fin, w,p)-{ 
foutO FFT (/O, w,p — 1); 
foutl ~ FFT If I, w,p- 1); 
font := {foutO, foutl) 



vector (p) 
vector (p) 

array [k:=p..l] of vector(fc — 1) 

integer 

vector (p — 1) 



or 

when p = 0 do [font := fin] 



From now on, the notation vector(p) will stand for array [1..2^’] of com- 
plex. Procedure BStep corresponds to the basic step, and computes the two 
half-order vectors fO and fl in terms of the original vector fin by an iterative 
application of equations H2D and (0. The triangular matrix w contains the p 
vectors of the roots of the unity which are needed at each level of the computa- 
tion tree. At the end of the computation the components of Fn are not ordered 
in fout. A communicating version of the FFT can be defined as: 



P ::= ConiFFT (a,w,p) 



out P 

external in a 
external in w 
external in p 
local in, out 

local /O, fl 

a m; 

out ::= FFT {in,w,p); 
(/O, fl) ■■= out- 
p^fo-,p^fi-. 



channel of vector(p — 1) 

channel of vector(p) 

array [k := p..l] of vector(fc — 1) 

integer 

vector (p) 

vector (p — 1) 
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It is valid for p > 0. It simply communicates its input and output vectors via 
synchronous channels. The output is formed by two half-order vectors in series 
via channel [3. Similarly, for the basic step: 



P ::= ComBStep (a, w,p) 



local in : vector(p) 

local /O, /I : vector (p — 1) 
a => in; 

(fO, /I) ::= BStep (in, w,p); 



The declaration of the interface variables has been omitted since it is the 
same as before. The same is done for the next procedure, which shares the same 
interface. A pipelined FFT architecture SerialFFT is defined as: 



P ::= SerialFFT (a,w,p) :: 



"local 


A /I 


: vector(p — 1) 


local 


k,l 


: integer 


local 


7 


: Array [k := 0..p] of channel of vector(2' 


local 


(7(0), a) 


: equivalence 



[ life. Jo for Z := 1 to 2*^ do 'y(k + 1) ComBStep (y(k),w,p — k){\ 

II 

for m := 1 to 2^ ^ do 'y(p) ^ /0(m); 
for m := 1 to 2^“^ do 7 (p) =7 /l(m); 

_P<= f0;p^ fl; 



It is a pipeline architecture with p processes communicated via channels 7 (fc), 
ended with a process which constructs vectors /O and /I, and outputs them in 
sequence via channel p. Each one of the first p processes computes a number of 
basic steps in sequence, starting with one for the process corresponding to fc = 0, 
and doubling the number of basic steps of their corresponding predecessors in 
the pipeline. With the above definitions, the following congruence holds: 

Lemma 3 (Serial FFT Architecture). For p > 0.' 

[P ::= ComFFT (a,w,p)] « [P ::= SerialFFT (a, w,p)] 

The set of observed variables is now O = {var(a),var(P)} , where var(a) 
means the variables associated with synchronous channel a via communication 
statements. The proof cannot be given due to paper length restrictions. It is 
carried out by induction on p. Both, the base case and the induction step are 
proved with transformations involving the tactics and rules which have been 
covered so far, plus some simple tactics for the rearrangement of sequences of 
parallel disjoint processes and iterations, and for the elimination of variables. 

7 Conclusions and Future Work 

The mathematical justification of a set of transformation rules for the elimination 
and introduction of synchronous communications and parallelism in imperative 
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concurrent programs has been accomplished. A set of tactics has been defined 
for the application of the rules, they ease the generation of broader transforma- 
tions guaranteeing that they are always consequence of a certain sequence of rule 
applications. As an illustration, a distributed FFT algorithm has been derived 
with the methods introduced in this work, starting from a sequential recursive 
version. It has been shown that an infinite set of relations is needed in general for 
the elimination of a matching pair of synchronous communication statements. 
All this has been worked out in the framework of Manna and Pnueli for reactive 
systems, and with the usual fairness assumptions. However, a restriction on the 
SPL notation has been needed, which is not important for the applications, and 
has endowed the notation with very intuitive congruences for the introduction 
and elimination of skip substatements in concatenation and cooperation state- 
ments. These are needed in conjunction with the proper elimination-introduction 
rules. The main treatment has been limited to programs formed with basic, con- 
catenation, and cooperation statements. Although this is sufficient to show, as 
it has been done, that no finite set of relations is complete, the laws and tactics 
could be extended to general programs with selection statements. They could be 
incorporated into an interactive system complementing verification and model 
checking in an integrated formal design environment. This will form a good basis 
for an interactive transformation system. Nevertheless, more work is needed in 
the evaluation of heuristics to guide the rewriting. Once all this work is com- 
pleted, repetition of verification and/or model checking for related programs 
may be avoided in many cases by carrying out communication and parallelism 
introduction-elimination transformations, forming part of a comprehensive for- 
mal design process. 
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A Proof of the Skip Non-congruences 

Proof of Remark m In order to prove the first relation, define the program 
context P[S] as the program shown below. Then P[a T] always terminates 
under standard fairness. This is due to the synchronization between Pi and 
P 3 imposed by the synchronous communication via channel a. Just after this 
transition is taken, the joint transition between the same two processes, but via 
/3, is enabled. Since both communication statements occur within an indefinite 
loop in both processes, this enabling occurs an indefinite number of times. 
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local X, y, z,v : boolean where x = T,y = T,z = T,v = T 

local a, /3, 7 , (5 : channel of boolean 



kg: while x do 



kg: P x\k/^: a<=F;A: 5 : 7 F 



Pi :: fci: S\ k2'. 



or 



kg: 7 T 



II 




II 



mo: while 2 : do 



P 3 :: mi: a => 2 ; m 2 : 



m 3 : / 3 <^=F;m 4 : a^z\mg: <5 F 

or 



mg: 5 <= T 



II 



P4 :: 



no: while v do 



[ni: (5 =7> u] 



But, since the joint transition associated with synchronous communication 
statements are in the compassion set (strongly fair), the synchronous communi- 
cation via (3 has to be taken eventually. Once this occurs, the variables x, z, v, 
and y take the value false and the four processes terminate. 

The communication via (3 may not occur in the program P[a T;skip] 
since the presence of the new skip statement allows the existence of indefinite 
computations having no enabling of the communication via f3. 

This is so since now, due to the skip statement, the synchronization via 
channel a does not necessarily activate simultaneously the control locations cor- 
responding to labels k 2 and m 2 , consequently the joint transition of the syn- 
chronous communication via channel (3 may never be enabled. Therefore pro- 
gram P[a T; skip] has a non-terminating computation. This proves the first 
non-congruence. The other two are consequences of the first. □ 

The proof of the other remark is similar. 



A reduced run of a FTS is obtained from a run, as defined in section I3 in the 
same way that a reduced behavior was obtained from a computation in that 
section. 

Definition 2 (Reduced Run). Let M &e a FTS, and O a set of observed 
variables, where tt is not in O. Then a reduced run r’' is obtained from a run r 
o/M by retaining the observable part of all the states appearing in r and deleting 
any state which is equal to its predecessor but not equal to its successor. 



B Proof of Theorem Q] 
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Therefore, stuttering steps are removed from the observable part of a run 
provided that they do not correspond to a terminal state. Notice also that a 
reduced run does not need to satisfy any fairness requirements. Then, reduced 
behaviors would be the subset of reduced runs which satisfy the fairness require- 
ments. The concept of reduced run can be extended to programs, as the reduced 
run of the FTS associated with the program. 

Lemma 4 (Skip Congruence in a Wide Sense). Let _] be an arbitrary 
program context. Then, the sets of reduced runs of P[ Si] S'2 ] and 
P[ Si] skip; S2 ] are identical. We express this fact by saying that the two 
concatenation statements are congruent in the wide sense. 

noindentProof. Introduce m as the post-label of Si in P[Si]m : S'2] and of 
skip in P[Si;^ : skip; m : S2]. Also, £ will be the post-label of Si in P[Si]i : 
skip; m : S2] 

1. Consider a run r of P[Si; m : S2]. We obtain a run r' of P[Si; i : skip; m : S2] 
by requiring that whenever control reaches m, which is now relabeled as £, 
a skip transition ti is taken immediately. Clearly, r' is a run of P[Si]£ : 
skip; m : S2] and the reduced runs of both r and r' are identical, since their 
only differences are at the transitions Ti which have been introduced in r' , 
corresponding to the skip statement. These transitions have identical initial 
and final states. For each such pair of states in r', the run r has only one 
state. But the first of the two equal states of such pairs will be deleted when 
the corresponding reduced run is constructed, as it has been defined above. 
Then, the reduced runs of both r and r' will be equal. 

2. The reverse direction is similar. □ 

Let us describe the deletion of skip transitions Ti with some detail. The rest of 
this appendix will need and refer to it. A computation a of P[5'i; £ : skip; m : S'2] 
will have state subsequences of the following form: 

• • ■ f — 1'! ^i—fl • • • 5 • • • 7 '^2+n5 • ■ ■ 

where Si-f,...,Si are Pstates, in the sense that the control location corres- 
ponding to £ belongs to tt in these states, is not an Pstate. Also, states 

Si+i, . . . , Si+Ti_i are m-states, and the transition taken at Si is the skip transition 
T^, Si+n, is not an m-state. Hence, the last transition corresponding to Si is 
taken at state and a front transition of S2 is taken at state 

It will be referred to later on as transition r'. Let us construct now the state 
sequence a' by deleting all T(, transitions from a. This entails replacing ^ by m 
in all the states of a. This sequence of states will have subsequences of the form 
. . . Si-f-i, Si-f, . . . ,Si = Si+i, . . . , Si+n, ■ ■ ■ which will correspond to the above 
subsequences of a. The states Si-f,...,Si = become now m-states by 

construction. States Si and s^+i collapse into the same state. The rest of the 
sequence remains the same, states Si+2 . . . , si+n-i continue being m-states. 

For realistic schedulers we would like that the congruence of lemma E] was 
true in a strict sense. In other words, that for an arbitrary program context P[_], 
the set of reduced behaviors for P[5'i; m : S'2] and for P[Si; £ : skip; m : S2] were 
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identical. This is not true due to remark Q] of section 0 However, the following 
lemma expresses the fact that it is true in one direction. 

Lemma 5 (Sequential Skip Insertion). Let _ ] be an arbitrary program 
context. Then, any reduced behavior of P[ S'!; £'2 1 is also a reduced behavior 
of P[ skip; k] . 

The proof is not given due to paper length restrictions. It is based on the 
fact that the positions in which a transition r is enabled or taken in both a and 
a' are the same when only skip transitions are inserted. 

The reverse of lemmaOis not true since, when deleting skip transitions from a 
computation, there is the possibility that some transition which was not enabled 
in the final state of the skip transition becomes enabled in the corresponding 
state obtained after the deletion. Hence, the satisfaction of fairness requirements 
may change for such transition. Next lemma characterizes the scenarios where 
the above may occur, and the transitions which are involved. 

Lemma 6 (Unfairness Scenario). The only way in which a, a computation 
of P[Si;£ : skip; m : S2], can be fair to a transition t but a' , the state sequence 
of P[S\m : S2] constructed by deleting Ti transitions as detailed above, be unfair 
to T is if T is enabled infinitely often in a' but only finitely often in a, and t is 
taken only finitely many times in both. 

The proof is based on the fact that the deletion of the the skip transitions 
modifies states Si_ /,..., s^. This may enable transitions in these states which 
were not enabled in the corresponding states of cr. 

Lemma 7 (Compassionate Transition). If the state sequence a' is unfair 
with respect to transition t, which is enabled finitely often in £ : skip; m : 

S'2] but infinitely often in P[S;m : S'2], then the control location corresponding 
to label m is visited an indefinite number of times, in both a and a' . Also, t has 
to be a front transition of S2 and compassionate (strongly fair). 

The proof is based mainly on the fact that the new enablings take place in 
at least one of the states of the sequence Si_ /,..., Si_i, which is preceded and 
followed by states where r is not enabled, r is enabled when control is at the 
location corresponding to m, and these sequences of states are finite. 

Two transitions r and t' are competing if both are directly associated with 
the same selection statement. In addition, the two competing transitions may 
also correspond to a synchronous communication statement which does not form 
part of a selection statement, but which gives rise to two joint transitions with 
matching isolated communication statements which are parallel to it. 

Lemma 8 (Disabling Transition). A t -disabling transition t" occurs in both 
a and a' in one of the states of their state subsequence ..., Si_i. This 

transition should be competing with transition r but should be parallel to the 
transitions associated with statement S2. 
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Proof. This has to be so since transition r is enabled but not taken only in 
a subsequence of the state sequence Si-f, Si-i. Therefore some transition 
t" is taken in one of these states, disabling transition r. Transition r" has to 
be parallel to the transitions associated with S 2 since a front transition of this 
statement, r', is taken in state Si+n-i as it was pointed out before, and t" has to 
be taken in a prior state without moving control from the location corresponding 
to m. Transition t" disables transition r, hence it should be competing with it. 
□ 

Lemma 9 (Prohibited Statements). Within the reduced notation, The only 
possibilities for statement S 2 which could give rise to the three transitions t, 
t' and t" identified above are a synchronous communication statement and a 
communication selection statement. 

We note that if this lemma is true then theorem His also true. The proof is 
based on a review of the semantics of the notation as defined in section o □ 

References 

1 . R.-J. Back, J. von Wright, Refinement Calculus. A Systematic Introduction. 
Springer- Verlag 1998. 

2. S. Bensalem, M. Bozga, J.C. Fernandez, L. Ghirvu, Y. Lakhnech, A Transforma- 
tional Approach for Generating Non-Linear Invariants. In J. Palsberg (Ed.), Static 
Analysis, Proc. 7th Inti. Symp. SAS 2000, Santa Barbara, CA, USA, June 29 - July 
1, 2000. LNCS Vol. 1824, Springer, 2000, pp. 58-74. 

3. M. Bertran, F. Alvarez-Cuevas, A. Duran, Communication Extended Abstract 
Types in the Refinement of Parallel Communicating Processes, in Transformation- 
Based Reactive Systems Development, LNCS v.1231. Springer, 1997. 

4. N.S. Bjprner, A. Browne, M. Colon, B. Finkbeiner, Z. Manna, H.B. Sipma, and 
T.E. Uribe. Verifying Temporal Properties of Reactive Systems: A STeP Tutorial. 
Formal Methods in System Design, 16, 227-270, June 2000. 

5. N.S. Bjprner, A. Browne, E. Chang, M. Colon, A. Kapur, Z. Manna, H.B. Sipma, 
and T.E. Uribe. STeP: The Stanford Temporal Prover, User’s Manual. Technical 
Report STAN-CS-TR-95-1562, Computer Science Department, Stanford Univer- 
sity, November 1995. 

6. S.D. Brookes. ‘Full abstraction for a shared variable parallel language’. Information 
and Computation, 127(2):145-163, June 1996. 

7. M. Broy, ‘Functional Specification of Time-Sensitive Communicating Systems’, 
ACM Transactions on Software Engineering and Methodology, 2(1),: 1-46, January 
1993. 

8. M. Broy, ‘Refinement of Time’, in M. Bertran and T. Rus (eds.), Transformation- 
Based Reactive Systems Development, Springer- Verlag, Lecture Notes in Computer 
Science 1231, 1997, pp. 44-63. 

9. M. Broy, ‘A Logical Basis for Component-Based Systems Engineering’, Tech. Re- 
port Inst, fr Informatik, Tech. Univ. Munchen, Germany. 

10. K.M. Chandy and J. Misra, Parallel Program Design, Addison Wesley, 1988. 

11. Wei-Ngan Chin, Sian-Cheng Khoo, Z. Hu, M. Takeidu, Deriving Parallel Codes via 
Invariants. In J. Palsberg (Ed.), Static Analysis, Proc. 7th Inti. Symp. SAS 2000, 
Santa Barbara, CA, USA, June 29 - July 1, 2000. LNCS Vol. 1824, Springer, 2000, 
pp. 75-94. 



Communication and Parallelism Introduction and Elimination 



39 



12. E.M. Clarke, O. Grumberg, D.A. Peled, Model Checking, The MIT Press, 1999. 

13. J. Dingel, ‘A Trace-Based Refinement Calculus for Shared- Variable Parallel Pro- 
grams’, in A. Martin Haeberer (Ed.) Algebraic Methodology and Software Technol- 
ogy, AMAST’98, LNCS 1548, Springer- Verlag, pp. 231-247, 1998. 

14. B. Finkbeiner, Z. Manna, H. Sipma, Deductive Verification of Modular Systems. 
In Compositionality: The Significant Difference, COMPOS’97, LNCS v. 1536, pp. 
239-275, Springer 1998. 

15. M. Gordon, A.J. Milner, Ch. P. Wadsworth, Edinburgh LCF, LNCS v.78, Springer- 
Verlag, 1979. 

16. C.A.R. Hoare, ‘Communicating Sequential Processes’, Communications of ACM, 
Vol 21, pp 666-677, 1978. 

17. C.A.R. Hoare, Communicating Seguential Processes, Prentice-Hall, Englewood 
Cliffs, N.J., 1985. 

18. Gerald Holtzmann, Design and Validation of Computer Protocols, Prentice Hall, 
1991. 

19. J. Hooman, ‘Extending Hoare Logic to Real-Time’, Formal Aspects of Computing, 
6A: 801-825, BCS, 1994. 

20. Y. Kesten, Z. Manna, A. Pnueli, ‘Temporal Verification of Simulation and Refine- 
ment’, In REX Symposium A Decade of Concurrency, Lecture Notes in Computer 
Science 803, pp. 273-346, Springer- Verlag, 1994. 

21. L. Lamport, ‘The Temporal Logic of Actions’, ACM Trans. Progr. Lang, and Sys., 
16(3):872-923. 

22. B. Mahony, ‘Using the Refinement Calculus for Dataflow Processes’. Tech. Report 
94-32, Soft. Verification Research Centre, University of Queensland, October 94. 

23. Z. Manna, A. Pnueli, The Temporal Logic of Reactive and Concurrent Systems. 
Specification. Springer- Verlag, 1991. 

24. Z. Manna, A. Pnueli, Temporal Verification of Reactive Systems. Safety. Springer- 
Verlag, 1995. 

25. K.L. McMillan, and D.L. Dill, Symbolic Model Checking: An Approach to the State 
Explosion Problem, Kluwer Academic, 1993. 

26. R. Milner, A Calculus of Communicating Systems, Springer- Verlag, 1980. 

27. R. Milner, Communication and Concurrency, Prentice-Hall 1989. 

28. M.Muller-Olm, D.A. Schmit, B. Steffen, Model Checking: A Tutorial Introduction. 
In A.Cortesi, C.File (Eds.), Static Analysis, Proc. 6th Inti. Symp. SAS’99, Venice, 
Italy, September 22-24, 1999. LNCS, Vol 1694, Springer, 1999, pp. 330-354. 

29. Wei-Ngan Chin, Sian-Cheng Khoo, Z. Hu, M. Takeidu, Deriving Parallel Codes via 
Invariants. In J. Palsberg (Ed.), Static Analysis, Proc. 7th Inti. Symp. SAS 2000, 
Santa Barbara, CA, USA, June 29 - July 1, 2000. LNCS Vol. 1824, Springer, 2000, 
pp. 75-94. 

30. A.V. Oppenheim, R.W. Shafer, Digital Signal Processing, Prentice Hall, N.J., 1975. 

31. A. Papoulis, Signal Analysis, McCraw-Hill, N.Y., 1977. 

32. A. Podelski, Model Checking as Constraint Solving. In J. Palsberg (Ed.), Static 
Analysis, Proc. 7th Inti. Symp. SAS 2000, Santa Barbara, CA, USA, June 29 - 
July 1, 2000. LNCS Vol. 1824, Springer, 2000, pp. 22-37. 

33. L.R. Rabiner, B. Cold, Theory and Application of Digital Signal processing, Pren- 
tice Hall, N.J., 1975. 




Using Slicing to Identify Duplication 
in Source Code 



Raghavan Komondoor^ and Susan Horwitz^’^ 



^ Computer Sciences Department 
University of Wisconsin-Madison 
Madison, WI 53706 USA 
{raghavan, horwitz}@cs .wise . edu 
2 lEI del CNR, Pisa Italy 



Abstract. Programs often have a lot of duplicated code, which makes 
both understanding and maintenance more difficult. This problem can be 
alleviated by detecting duplicated code, extracting it into a separate new 
procedure, and replacing all the clones (the instances of the duplicated 
code) by calls to the new procedure. This paper describes the design and 
initial implementation of a tool that finds clones and displays them to 
the programmer. The novel aspect of our approach is the use of program 
dependence graphs (PDGs) and program slicing to find isomorphic PDG 
subgraphs that represent clones. The key benefits of this approach are 
that our tool can find non-contiguous clones (clones whose components 
do not occur as contiguous text in the program), clones in which match- 
ing statements have been reordered, and clones that are intertwined with 
each other. Furthermore, the clones that are found are likely to be mean- 
ingful computations, and thus good candidates for extraction. 



1 Introduction 

Programs undergoing ongoing development and maintenance often have a lot 
of duplicated code. The results of several studies indicate that 7-23% 

of the source code for large programs is duplicated code. Duplication results 
in increased code size and complexity, making program maintenance more dif- 
ficult. For example, when enhancements or bug fixes are done on one instance 
of the duplicated code, it may be necessary to search for the other instances in 
order to perform the corresponding modification. Lague et al H3| studied the 
development of a large software system over multiple releases and found that in 
fact, programmers often missed some copies of duplicated code when performing 
modifications. 

A tool that finds clones (instances of duplicated code) can help alleviate these 
problems: the clones identified by the tool can be extracted into a new procedure, 
and the clones themselves replaced by calls to that procedure. In that case, there 
will be only one copy to maintain (the new procedure), and the fact that the 
procedure can be reused may cut down on future duplication. (Note that for a 
language like C with a preprocessor, macros can be used instead of procedures 
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if there is a concern that introducing procedures will result in unacceptable 
performance degradation.) 

For an example illustrating clone detection and extraction, see Figure E The 
left column shows four fragments of code from the Unix utility bison. The four 
clones are indicated by the “++” signs. The function of the duplicated code is 
to grow the buffer pointed to by p if needed, append the current character c to 
the buffer and then read the next character. In the right column, the duplicated 
code has been extracted into a new procedure next_char, indicated by the “++” 
signs, and all four clones replaced by calls to this procedure. The four calls are 
indicated by “**” signs. 

This paper describes the design and initial implementation of a tool for C 
programs that finds clones suitable for procedure extraction and displays them 
to the programmer. The novel aspect of the work is the use of program de- 
pendence graphs (PDGs) 0, and a variation on program slicing IlHIltil to find 
isomorphic subgraphs of the PDG that represent clones. The key benefits of a 
slicing-based approach, compared with previous approaches to clone detection 
that were based on comparing text, control-flow graphs, or abstract-syntax trees, 
is that our tool can find non-contiguous clones (i.e., clones whose statements do 
not occur as contiguous text in the program, such as in Fragments 1 and 2 in 
Figure clones in which matching statements have been reordered, and clones 
that are intertwined with each other. Furthermore, the clones found using this 
approach are likely to be meaningful computations, and thus good candidates 
for extraction. 

The remainder of this paper is organized as follows: Section 0 describes how 
our tool uses slicing to find clones, and the benefits of this approach. Section 0 
describes an implementation of our tool, and some of the insights obtained from 
running the tool on real programs. SectionEldiscusses related work, and Section 0 
summarizes our results. 



2 Slicing-Based Clone Detection 

2.1 Algorithm Description 

To find clones in a program, we represent each procedure using its program 
dependence graph (PDG) |0]. In the PDG, nodes represent program statements 
and predicates, and edges represent data and control dependences. The algorithm 
performs three steps (described in the following subsections): 

Step 1 : Find pairs of clones. 

Step 2: Remove subsumed clones. 

Step 3 : Gombine pairs of clones into larger groups. 



Step 1: Find Pairs of Clones. We start by partitioning all PDG nodes into 
equivalence classes based on the syntactic structure of the statement/predicate 
that the node represents, ignoring variable names and literal values; two nodes 
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Fig. 1. Duplicated Code from hison. 















Using Slicing to Identify Duplication in Source Code 



43 



in the same class are called matching nodes. Next, for each pair of matching 
nodes (rl,r2), we find two isomorphic subgraphs of the PDGs that contain rl 
and r2. 

The heart of the algorithm that finds the isomorphic subgraphs is the use 
of backward slicing: starting from rl and r2 we slice backwards in lock step, 
adding a predecessor (and the connecting edge) to one slice iff there is a corre- 
sponding, matching predecessor in the other PDG (which is added to the other 
slice). Forward slicing is also used: whenever a pair of matching loop or if-then- 
else predicates (pl,p2) is added to the pair of slices, we slice forward one step 
from pi and p2, adding their matching control-dependence successors (and the 
connecting edges) to the two slices. Note that while lock-step backward slicing 
is done from every pair of matching nodes in the two slices, forward slicing is 
done only from matching predicates. An example to illustrate the need for this 
kind of limited forward slicing is given in Section 

When the process described above finishes, it will have identified two isomor- 
phic subgraphs (two matching “partial” slices) that represent a pair of clones. 
The process is illustrated using Figure |21 which shows the PDGs for the first 
two code fragments from Figure 0 (Function calls are actually represented in 




Fig. 2. Matching partial slices starting from *p++ = c;. The nodes and edges 
in the partial slices are shown in bold. 



PDGs using multiple nodes: one for each actual parameter, one for the return 
value, and one for the call itself. For clarity, in this example we have treated 
function calls as atomic operations.) Nodes 3a and 36 match, so we can start 
with those two nodes. Slicing backward from nodes 3a and 36 along their in- 
coming control-dependence edges we find nodes 5 and 8 (the two while nodes). 
However, these nodes do not match (they have different syntactic structure), so 
they are not added to the partial slices. Slicing backward from nodes 3a and 36 
along their incoming data-dependence edges we find nodes 2a, 3a, 4a, and 7 in 
the first PDG, and nodes 26, 36, and 46 in the second PDG. Node 2a matches 26, 
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and node 4a matches 46, so those nodes (and the edges just traversed to reach 
them) are added to the two partial slices. (Nodes 3a and 36 have already been 
added, so those nodes are not reconsidered.) Slicing backward from nodes 2a 
and 26, we find nodes la and 16, which match, so they (and the traversed edges) 
are added. Furthermore, nodes la and 16 represent if predicates; therefore we 
slice forward from those two nodes. We find nodes 2a and 26, which are already 
in the slices, so they are not reconsidered. Slicing backward from nodes 4o and 
46, we find nodes 5 and 8, which do not match; the same two nodes are found 
when slicing backward from nodes la and 16. 

The partial slices are now complete. The nodes and edges in the two partial 
slices are shown in Figure fusing bold font. These two partial slices correspond 
to the clones of Fragments 1 and 2 shown in Figure fusing “++” signs. 



Step 2: Remove Subsumed Clones. A clone pair (S'!', S2') subsumes an- 
other clone pair (S'!, S2) iff S'! C S'!' and S2 C S2'. There is no reason for the 
tool to report subsumed clone pairs; therefore, this step removes subsumed clone 
pairs from the set of pairs identified in Step 1. 



Step 3: Combine Pairs of Clones into Larger Groups. This step combines 
clone pairs into clone groups using a kind of transitive closure. For example, clone 
pairs (S'!, 52), (51,53), and (52,53) would be combined into the clone group 
(51,52,53). 

2.2 Need for Forward Slicing 

Our first implementation of the clone-detection tool did not include any for- 
ward slicing. However, when we looked at the clones that it found we saw that 
they sometimes were subsets of the clones that a programmer would have identi- 
fied manually. In particular, we observed that conditionals and loops sometimes 
contain code that a programmer would identify as all being part of one logical 
operation, but that is not the result of a backward slice from any single node. 

One example of this situation is error-handling code, such as the two frag- 
ments in Figure El from the Unix utility tail. The two fragments are identical 
except for the target of the final goto, and are reasonable candidates for extrac- 
tion; they both check for the same error condition, and if it holds, they both 
perform the same sequence of actions: calling the error procedure, setting the 
global error variable, and freeing variable tmp. (The final goto should of course 
not be part of the extracted procedure; instead, that procedure would need to 
return a boolean value to specify whether or not the goto should be executed.) 
However, the two fragments cannot be identified as clones using only backward 
slicing, since the backward slice from any statement inside the if fails to include 
any of the other statements in the if. It is the forward-slicing step from the 
pair of matched if predicates that allows our tool to identify these two code 
fragments as clones. 
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Fragment 1: Fragment 2: 



if (tmp->nbytes == -1) 

{ 

error (0, errno, "7,s", filename); 
errors = 1; 
free ((char *) tmp) ; 
goto free_lbuff ers ; 



if (tmp->nbytes == -1) 

{ 

error (0, errno, "7,s", filename); 
errors = 1; 
free ((char *) tmp); 
goto free_cbuff ers ; 



Fig. 3. Error-handling code from tail that motivates the use of forward slicing. 



Other examples where forward slicing is needed include loops that set the 
values of two related but distinct variables (e.g., the head and tail pointers of a 
linked list). In such examples, although the entire loop corresponds to a single 
logical operation, backward slicing alone is not sufficient to identify the whole 
loop as a clone. 

2.3 Preventing Clones that Cross Loops 

Based on experience gained from applying the algorithm to real programs, we 
found that we needed a heuristic to prevent clones that “cross” loops; i.e., clones 
that include nodes both inside and outside a loop but not the loop itself. To illus- 
trate this, consider the two code fragments (from bison) in Figure^ The clones 



Fragment 1: Fragment 2: 



fp3 = lookaheadset + tokensetsize ; 
for (i = lookaheads (state) ; 
i < k; i++) { 

++ tpl “ LA '*■ 

++ i * tokensetsize; 

++ f p2 = lookaheadset ; 

++ while (fp2 < fp3) 

++ *fp2++ 1= *fpl++; 

} 



fp3 = base + tokensetsize; 
if (rp) { 

while ((j = *rp++) >= 0) { 

++ fpl = base; 

++ fp2 = F + 

++ j * tokensetsize; 

++ while (fpl < fp3) 

++ *fpl++ 1= *fp2++; 

} 

} 



Fig. 4. Two clones from bison that illustrates the heuristic that avoids “crossing” 
a loop. These clones also illustrate variable renaming and statement reordering. 



identified by our tool are shown using “-I--I-” signs. Each of these clones modifies 
a portion of a bit array (lookaheadset / base) by performing a bit-wise or with 
the contents of another array (LA / F). The clones are identified by slicing back 
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from the statement that does the bit-wise or. Note that the two initial assign- 
ments to fp3 are matching statements that are data-dependence predecessors 
of matching nodes in the two clones (the nodes that represent the final while 
predicates). Therefore, the algorithm as described above in Section [2.11 would 
have included the two initial assignments in the clones. It would not, however, 
have included in the clones the for loop in the first fragment and the outer 
while loop in the second fragment because the predicates of those loops do not 
match. 

The resulting clones would therefore contain the statements inside the loops 
and the assignments outside the loops, but not the loops themselves. This would 
make it difficult to extract the clones into a separate procedure. To prevent the 
algorithm from identifying “difficult” clones like these, we use a heuristic during 
the backward slicing step: when slicing back from two nodes that are inside loops, 
we add to the partial slices predecessor nodes that are outside the loops only if 
the loop predicates match (and so will also be added to the partial slices). That 
is why, as indicated in Figure 0 the initial assignments to fp3 are not included 
in the clones identified by the tool. 

2.4 Benefits of the Approach 

As stated in the Introduction, the major benefits of a slicing-based approach to 
clone detection are the ability to find non-contiguous, reordered, and intertwined 
clones, and the likelihood that the clones that are found are good candidates for 
extraction. These benefits, discussed in more detail below, arise mainly because 
slicing is based on the PDG, which provides an abstraction that ignores arbitrary 
sequencing choices made by the programmer, and instead captures the important 
dependences among program components. In contrast, most previous approaches 
to clone detection used the program text, its control- flow graph, or its abstract- 
syntax tree, all of which are more closely tied to the (sometimes irrelevant) 
lexical structure. 

Finding Non-contiguous, Reordered, and Intertwined Clones: One ex- 
ample of non-contiguous clones identified by our tool was given in Figure ^ By 
running a preliminary implementation of the proposed tool on some real pro- 
grams, we have observed that non-contiguous clones that are good candidates 
for extraction (like the ones in Figure QJ occur frequently (see Section 0 for fur- 
ther discussion). Therefore, the fact that our approach can find such clones is a 
significant advantage over most previous approaches to clone detection. 

Non-contiguous clones are a kind of near duplication. Another kind of near 
duplication occurs when the ordering of matching nodes is different in the dif- 
ferent clones. The two clones shown in Figure 0 illustrate this. The clone in 
Fragment 2 differs from the one in Fragment 1 in two ways: the variables have 
been renamed (including renaming fpl to fp2 and vice versa), and the order of 
the first and second statements (in the clones, not in the fragments) has been 
reversed. This renaming and reordering does not affect the data or control de- 
pendences; therefore, our approach finds the clones as shown in the figure, with 
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the first and second statements in Fragment 1 that are marked with “++” signs 
matching the second and first statements in Fragment 2 that are marked with 
“++” signs. 

The use of program slicing is also effective in finding intertwined clones. An 
example from the Unix utility sort is given in Figure 0 In this example, one 
clone is indicated by “++” signs while the other clone is indicated by “xx” 
signs. The clones take a character pointer (a/b) and advance the pointer past all 
blank characters, also setting a temporary variable (tmpa/tmpb) to point to the 
first non-blank character. The final component of each clone is an if predicate 
that uses the temporary. The predicates were the starting points of the slices 
used to find the two clones (the second one - the second-to-last line of code in 
the figure - occurs 43 lines further down in the code). 

++ tmpa = UCHAR(*a) , 

XX tmpb = UCHAR(*b) ; 

++ while (blanks [tmpa] ) 

++ tmpa = UCHAR(*++a) ; 

XX while (blanks [tmpb] ) 

XX tmpb = UCHAR(*++b) ; 

++ if (tmpa == ’ - ’ ) { 

tmpa = UCHAR(*++a) ; 

} 

XX else if (tmpb == ■[ 

if (. . .UCHAR(*++b) . . .) ... 



Fig. 5. An Intertwined Clone Pair from sort. 



Finding Good Candidates for Extraction: As discussed in the Introduc- 
tion, the goal of our current research is to design a tool to help find clones to 
be extracted into new procedures. In this context, a good clone is one that is 
meaningful as a separate procedure (functionally) and that can be extracted out 
easily without changing program semantics. The proposed approach to finding 
clones is likely to satisfy both these criteria as discussed below. 

Meaningful clones: In order for a code fragment to be meaningful as a sep- 
arate procedure, it should perform a single conceptual operation (be highly co- 
hesive pi 7]). That means it should compute a small number of outputs (outputs 
include values assigned to global variables and through pointer parameters, the 
value returned, and output streams written). Furthermore, all the code to be ex- 
tracted should be relevant to the computation of the outputs (i.e., the backward 
slices from the statements that assign to the outputs should include the entire 
clone) . 

A partial slice obtained using backward slicing has a good chance of being 
cohesive because we start out from a single node and include only nodes that 
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are relevant to that node’s computation. However, in addition to being cohe- 
sive, a meaningful procedure should be “complete” . In practice, we have found 
that there are examples (like the one in Figure 0) where backward slicing alone 
omits some relevant statements. Our use of forward slicing seems to address this 
omission reasonably well. 

Extractable clones: A group of clones cannot be eliminated by procedure 
extraction if it is not possible to replace the clones by calls to the extracted 
procedure without changing program semantics. Such clone groups are said to 
be inextract able. Since semantic equivalence is, in general, undecidable, it is not 
always possible to determine whether a group of clones is extractable. In m we 
identified sufficient conditions under which a single, non-contiguous clone can be 
extracted by first moving its statements together (making it contiguous), then 
creating a new procedure using the contiguous statements, and finally replacing 
the clone with a call to the new procedure. 

In the example in Figure E the duplicated code indicated by the 
signs meets the extractability criteria of m- However, in the same example, if 
we wanted each clone to consist of just the two lines indicated by signs 

below, we would face problems: 

++ if (p == token_buffer + maxtoken) 
p = grow_token_buf f er (p) ; 

++ *p++ = c ; 

There is no obvious way of extracting out just these two lines because the 
statement p = grow_token_buffer (p) cannot be moved out of the way from 
in between the above two lines without affecting data and control dependences 
(and hence without affecting semantics). 

Because backward slicing follows dependence edges in the PDG, it is more 
likely to avoid creating a “dependency gap” (e.g., including the statement *p++ = 
c and its dependence grandparent if (p == token_buffer + maxtoken), but 
omitting its dependence parent p = grow_token_buffer (p)) than a text- or 
syntax-tree based algorithm that detects non-contiguous clones. The heuristic 
described in Section O is another aspect of our approach that helps avoid 
identifying inextract able clones. 



3 Experimental Results 

We have implemented a preliminary version of our proposed tool to find clones 
in C programs using the slicing-based approach described above. Our imple- 
mentation uses CodeSurfer mg to process the source code and build the PDGs. 
GodeSurfer also provides a GUI to display the clone groups identified by the tool 
using highlighting. 

The implementation of Step 1 of the algorithm (finding clone pairs) is done 
in Scheme, because GodeSurfer provides a Scheme API to the PDGs. The other 
two steps of the algorithm (eliminating subsumed clone pairs, and combining 
clone pairs into clone groups) are done in G-I--I-. 
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We have run the tool on three Unix utilities, bison, sort and tail, and on four 
files from a graph-layout program used in-house by IBM. The results of these 
experiments are presented in the following subsections. 



3.1 Unix Utilities 

Figure El gives the sizes of the three Unix utilities (in lines of source code and 
in number of PDG nodes), and the running times for the three steps of the 
algorithm. Figure 0 presents the results of running the tool on those three pro- 
grams; for each of eight clone size ranges, three sets of numbers are reported: 
the number of clone groups identified that contain clones of that size, and the 
max and mean numbers of clones in those groups (the median number of clones 
in the groups of each size range was always two) . Our experience indicates that 
clones with fewer than five PDG nodes are too small to be good candidates for 
extraction, so they are ignored by our tool. 





Program Size 


Running Times (elapsed time) 


Program 


# ot lines 
of source 


# of PUG 
nodes 


find clone 
pairs (Scheme) 


eliminate subsumed 
clone pairs (C-I-+) 


combine pairs 
into groups(C++) 


bison 


11,540 


28,548 


1:33 hours 


15 sec. 


50 sec. 


sort 


2,445 


5,820 


10 min. 


5 sec. 


2 sec. 


tail 


1,569 


2,580 


40 sec. 


1 sec. 


2 sec. 



Fig. 6. Unix Program Sizes and Running Times. 
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Clone Size Ranges (# of PDG nodes) 
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34 


16 


9 
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6 
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61 


26 


11 


2 


2 


2 


2 
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mean ^ clones in a group 


3.7 


2.8 


3.3 


2 


2 


2 


2 


2.1 


sort 


5-9 
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30-39 


40-48 








# clone groups 


105 


57 


30 


9 


14 








max ^ clones in a group 


17 


8 


6 


3 


2 








mean ^ clones in a group 


3.0 


2.8 


2.4 


2.1 


2 








tail 


5-9 


10-19 


20-29 


30-39 


40-49 


50-59 


60-69 


70-85 


# clone groups 


21 


4 


0 


0 


4 


1 


0 


2 


max ^ clones in a group 


12 


8 






3 


2 




2 


mean ^ clones in a group 


3.2 


3.5 






2.3 
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Fig. 7. Results of Running the Tool. 



When run on the Unix utilities, the tool found a number of interesting clones, 
many of which are non-contiguous and some of which involve reordering and 
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intertwining. These preliminary results seem to validate both the hypothesis 
that programs often include a significant amount of “near” duplication, and the 
potential of the proposed approach to find good quality clones. 

Some examples of the interesting clones identified by the tool are listed below. 

— The four-clone group shown in Figure ^ from bison. 

— The two clones shown in Figure 0 from bison. These were part of a three- 
clone group. The third clone involved a different renaming of variables, and 
used the same statement ordering as the clone in Fragment 1. 

— The pair of intertwined clones shown in Figure from sort. 

— A group of seven clones from bison, identical except for variable names. Two 
of the clones are shown in Figure |S1 The clones were found by slicing back 
from the statement putc ( ’ , ’ , f table) . This code prints the contents of an 
array (check / rrhs), ten entries to a line, separated by commas. 



Fragment 1: 

++ j = 10; 

++ for (i=l; i < high; i++) { 

++ putc(’,’, ftable) ; 

++ if (j >= 10) { 

++ putc(’\n’, ftable); 

++ j = 1; 

++ } 

++ else 

++ j++; 

++ fprintf (ftable , ""/,6d" , check 

++ } 

Fig. 8. Seven Copies of this 



Fragment 2: 


++ 


j = 


10; 


++ 


for 


(i=l; i < nrules; i++){ 


++ 




putc(’,’, ftable); 


++ 




if (j >= 10) { 


++ 




putc(’\n’, ftable); 


++ 




j = 1; 


++ 




} 


++ 




else 


++ 




j++; 


i] ) ; ++ 




fprintf (ftable , ""/,6d" , 
rrhs [i] ) ; 


++ 


} 





Clone Were Found in bison. 



One limitation of the tool is that it often finds variants of the “ideal” clones 
(the clones that would be identified by a human) rather than finding exactly the 
ideal clones themselves. To illustrate this, consider the example in Figure El In 
that example, the ideal clones would not include the final if predicates; therefore, 
the clones found by the tool (which do include those predicates) are variants on 
the ideal ones. In the same example fragment, the tool also identifies a second 
pair of clones that is a slightly different variant of the ideal pair: this second 
pair includes everything in the ideal clones, does not include the if predicates, 
but does include the expressions UCHAR(*++a) and UCHAR(*++b) that occur in 
the last and fifth-to-last lines of code (the lines not marked with “-I— 1-” or “xx” 
signs) . 

To further evaluate the tool we performed two studies, described below. The 
goals of the studies were to understand better: 
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a. whether the tool is likely to find (variants of) all of the ideal clones; 

b. to what extent the tool finds multiple variants of the ideal clones rather than 
exactly the ideal ones; 

c. how many “uninteresting” clones the tool finds (i.e., clones that are not 
variants of any ideal clone), and how large those clones are; 

d. how often non-contiguous clones, intertwined clones, and clones that involve 
statement reordering and variable renaming occur in practice. 

For the first study, we examined one file (lex.c) from bison by hand, and 
found four ideal clone groups. We then ran the tool on Zex.c, and it identified 
forty-three clone groups. Nineteen of those groups were variants of the ideal clone 
groups (including several variants for each of the four ideal groups, so no ideal 
clones were missed by the tool), and the other twenty- four were uninteresting. 
More than half of the uninteresting clone groups (13 out of 24) had clones with 
fewer than 7 nodes (which was the size of the smallest ideal clone); the largest 
uninteresting clone had 9 nodes. 

For the second study we examined all 25 clone groups identified by the tool 
for bison in the size range 30-49 (we chose an intermediate clone size in order 
to test the hypothesis that the uninteresting clones identified by the tool tend 
to be quite small) . All but one of those 25 groups were variants of 9 ideal clone 
groups (i.e., only one of them was uninteresting). 

In the two studies, we encountered a total of 11 ideal clone groups (two groups 
showed up in both studies) containing a total of 37 individual clones. Of those 
37, 10 were non-contiguous. Two of the 11 ideal clone groups involved statement 
reordering, five involved variable renaming, and none involved intertwined clones. 

3.2 IBM Code 

The goals of the experiments using the IBM code were: 

a. to see whether this code also contained non-contiguous, reordered, and in- 
tertwined clones; 

b. to gather some quantitative data on the immediate effects of extracting 
clones. 

Due to limitations of CodeSurfer, we were not able to process the entire IBM 
program. Therefore, we selected four (out of the 70-1- files) and ran the tool on 
each of those files individually. The larger clones found by the tool were then 
examined manually (about 250 clone groups were examined), and the clones 
best-suited for extraction were identified. The “ideal” versions of those clones 
were (manually) extracted into macro definitions, which were placed in the same 
files, and each instance of a clone was replaced by a macro call. (Macros rather 
than procedures were used to avoid changing the running time of the program.) 
A total of 30 clone groups containing 77 “ideal” clones were extracted. 

The results of the study are summarized in Figure 0 which gives, for each 

file: 
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— the size (in lines of source code and in number of PDG nodes); 

— the running time for the tool (in all four cases, Step 1 of the algorithm - 
finding clone pairs - accounted for at least 90% of the running time); 

— the number of clone groups that were extracted; 

— the total number of extracted clones; 

— the reduction in size of the file (in terms of lines of code); 

— the average reduction in size for functions that included at least one extracted 
clone (in terms of lines of code). 





#of 

lines of 
source 


PDG 

nodes 


running 

time 

(elapsed) 


# of clone 
groups 
extracted 


total # of 
clones 
extracted 


file size 
reduction 


av. fn size 
reduction 


file 1 


1677 


2235 


1:02 min 


3 


6 


1.9% 


5.0% 


file 2 


2621 


4006 


7:49 min 


12 


24 


4.7% 


12.4% 


file 3 


3343 


6761 


5:15 min 


3 


7 


2.1% 


4.4% 


file 4 


3419 


4845 


13:00 min 


12 


40 


4.9% 


10.3% 



Fig. 9. IBM File Sizes and Clone-Extraction Data. 



Of the 30 clone groups that were extracted, 2 involved reordering of matching 
statements, 2 involved intertwined clones, and most of them involved renamed 
variables. Of the 77 extracted clones, 17 were non-contiguous. 

3.3 Summary of Experimental Results 

The results of our experiments indicate that our approach is capable of find- 
ing interesting clones that would be missed by other approaches. Many of these 
clones are non-contiguous and involve variable renaming; some also involve state- 
ment reordering and intertwining. The Unix-code studies also indicate that the 
tool is not likely to miss any clones that a human would consider ideal, and 
additionally is not likely to produce too many clones that a human would con- 
sider uninteresting (except very small ones) . The IBM-code study provides some 
additional data about the amount of extractable duplicated code that may be 
found by our tool, and how extracting that code affects file and function sizes. 
Of course, the more important question is how duplicate code extraction affects 
the ease of future maintenance; unfortunately, such a study requires resources 
beyond those available to us (as noted in the Introduction, the work of US] does 
provide a first step in that direction). 

The two sets of studies also reveal that the current implementation often 
finds multiple variants of ideal clones rather than just the ideal ones. This may 
however not be a problem in practice; manually examining the 250 clone groups 
reported by the tool for the four IBM files and identifying the corresponding 30 
ideal clone groups took only about 3 hours. Nevertheless, future work includes 
devising more heuristics (like the one described in Subsection 12. .'ill that will 
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reduce the number of variants reported by the tool by finding clones that are 
closer to ideal. 

As for the running time, although the tool is currently very slow, we believe 
that this is more a question of its implementation than of some fundamental 
problem with the approach. As indicated in the table in FigureEl the bottleneck 
is finding clone pairs; one reason this step is so slow is that it is implemented in 
Scheme, and we use a Scheme interpreter, not a compiler. Another factor is that 
our primary concern has been to get an initial implementation running so that 
we can use the results to validate our approach (rather than trying to implement 
the algorithm as efficiently as possible). Future engineering efforts may reduce 
the time significantly. Furthermore, improvements that eliminate the generation 
of undesirable clones (e.g., variants of ideal clones) should speed up the tool. 
Finally, it may be possible (and profitable) to generate clone groups directly, 
rather than generating clone pairs and then combining them into groups (because 
for each clone group that contains n clones, we currently generate (n^ — n ) /2 
clone pairs first). 



4 Related Work 

The long-term goal of our research project is a tool that not only finds clones, but 
also automatically extracts a user-selected group of clones into a procedure. A 
first step in that direction was an algorithm for semantics-preserving procedure 
extraction CH. However, that algorithm only applies to a single clone; different 
techniques are needed to determine when and how a group of clones can be 
extracted into a procedure while preserving semantics. Also, while that work 
was related to the work presented here in terms of our over-all goal, it addressed 
a very different aspect, namely, how to do procedure extraction; there was no 
discussion of how to identify the code to be extracted, which is the subject of 
the current work. 

Other related work falls into 3 main categories: work on clone detection, work 
on converting procedural code to object-oriented code, and work on subgraph 
isomorphism. 

Clone Detection: Baker HEI describes an approach that finds all pairs of 
matching “parameterized” code fragments. A code fragment matches another 
(with parameterization) if both fragments are contiguous sequences of source 
lines, and some global substitution of variable names and literal values applied 
to one of the fragments makes the two fragments identical line by line. Comments 
are ignored, as is whitespace within lines. Because this approach is text-based 
and line-based, it is sensitive to lexical aspects like the presence or absence of 
new lines, and the ordering of matching lines in a clone pair. Our approach does 
not have these shortcomings. Baker’s approach does not find intertwined clones. 
It also does not (directly) find non-contiguous clones. A postpass can be used to 
group sets of matching fragments that occur close to each other in the source, 
but there is no guarantee that such sets belong together logically. 
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Kontogiannis et al m describe a dynamic-programming-based approach that 
computes and reports for every pair of begin-end blocks in the program the 
distance (i.e., degree of similarity) between the blocks. The hypothesis is that 
pairs with a small distance are likely to be clones caused by cut and paste 
activities. The distance between a pair of blocks is defined as the least costly 
sequence of insert, delete and edit steps required to make one block identical 
line-by-line to the other. This approach does not find clones in the sense of our 
approach, or Baker’s approach. It only gives similarity measures, leaving it to 
the user to go through block pairs with high reported similarity and determine 
whether or not they are clones. Also, since it works only at the block level it can 
miss clone fragments that are smaller than a block, and it does not effectively 
deal with variable renamings or with non-contiguous or out-of-order matches. 

Two other approaches that involve metrics are reported in ['/II 4) . The ap- 
proach of 0 computes certain features of code blocks and then uses neural 
networks to find similar blocks based on their features, while m uses function 
level metrics (e.g., number of lines of source, number of function calls contained, 
number of CFG edges, etc.) to find similar functions. 

Baxter et al ^ find exact clones by finding identical abstract-syntax tree 
subtrees, and inexact clones by finding subtrees that are identical when variable 
names and literal values are ignored. Non-contiguous and out-of-order matches 
will not be found. This approach completely ignores variable names when asked 
to find inexact matches; this is a problem because ignoring variable names re- 
sults in ignoring all data flows which itself could result in matches that are not 
meaningful computations worthy of extraction. 

Debray et al 0 use the CFG to find clones in assembly-language programs 
for the purpose of code compression. They find matching clones only when they 
occur in different basic blocks, no intertwined clones, and only a limited kind of 
non-contiguous clones. 

Converting Procedural Code to Object-Oriented Code: The primary 
goal of the work described by Bowdidge and Griswold in is to help convert 
procedural code to object-oriented code by identifying methods. As part of this 
process, they do a limited form of clone detection. Given a variable of interest, the 
tool does forward slicing from all uses of the variable. The slices are subsequently 
decomposed into a set of (overlapping) paths, with each path stretching from 
the “root” node (i.e., the node that has the use of the variable) to the end point 
of the slice. Finally the paths obtained from all slices are overlayed visually 
on a single diagram (only the operators of the nodes are shown) with common 
prefixes drawn only once. Each common prefix is a set of isomorphic paths in 
the PDG and therefore represents a duplicated computation; the user selects the 
prefixes to be extracted. There are a few significant differences between their 
approach and ours. They report only isomorphic paths in the PDG, whereas we 
report isomorphic partial slices. Our observation is that most clones that are 
interesting and worthy of extraction are not simply paths in the PDG. Their 
diagram can be very large for large programs, making it tedious for the user to 
figure out what clones to extract. Finally, they do only forward slicing, which in 
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our experience is not as likely to produce meaningful clones as a combination of 
backward and forward slicing; for example, of all the clones found by our tool 
that are illustrated in this paper, only the ones in Figures 0 and 0 correspond 
to forward slices. 

Subgraph Isomorphism: A number of people have studied the problem of 
identifying maximal isomorphic subgraphs Mliblbll8| . Since this in general is 
a computationally hard problem, these approaches typically employ heuristics 
that seem to help especially when the graphs being analyzed are representations 
of molecules. In our approach we identify isomorphic partial slices, not general 
isomorphic subgraphs. We do this not only to reduce the computational com- 
plexity, but also because clones found this way seem more likely to be meaningful 
computations that are desirable as separate procedures. 

5 Conclusions 

We have described the design and implementation of a tool that finds duplicated 
code fragments in C programs and displays them to the programmer. The most 
innovative aspect of our work is the use of program-dependence graphs and 
program slicing, which allows our tool to find non-contiguous clones, intertwined 
clones, and clones that involve variable renaming and statement reordering. 

Our implementation indicates that the approach is a good one; real code 
does include the kinds of clones that our tool is well-suited to handle (and that 
most previous approaches to clone detection would not be able to find), and 
the tool does find the clones that would be identified by a human. However, it 
currently finds many variants of the ideal clones. Future work includes developing 
heuristics to cut down on the number of variants identified, as well as to improve 
the running time of the implementation. 
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Abstract. Hardware designs typically combine parallelism and resource- 
sharing; a circuit’s correctness relies on shared resources being accessed 
mutually exclusively. Conventional high-level synthesis systems guaran- 
tee mutual exclusion by statically serialising access to shared resources 
during a compile-time process called scheduling. This approach suffers 
from two problems: (i) there is a large class of practical designs which 
cannot be scheduled statically; and (ii) a statically fixed schedule re- 
moves some opportunities for parallelism leading to less efficient circuits. 
This paper surveys the expressivity of current scheduling methods and 
presents a new approach which alleviates the above problems: first 
scheduling logic is automatically generated to resolve contention for 
shared resources dynamically; then static analysis techniques remove re- 
dundant scheduling logic. 

We call our method Soft Scheduling to highlight the analogy with Soft 
Typing: the aim is to retain the flexibility of dynamic scheduling whilst 
using static analysis to remove as many dynamic checks as possible. 



1 Introduction 

At the structural level a hardware design can be seen as a set of interconnected 
resources. These resources run concurrently and are often shared. 

The interaction between parallelism and resource-sharing leads to an obvi- 
ous problem: how does one ensure that shared resources are accessed mutually 
exclusively? Existing silicon compilers solve the mutual exclusion problem by 
statically serialising operations during a compile-time scheduling phase (see Sec- 
tion Ell)- This paper describes an alternative approach: 

We automatically generate circuitry to perform scheduling dynamically 
in a manner which avoids deadlock. Efficient circuits are obtained by 
employing static analysis to remove redundant scheduling logic. 

Our method is to scheduling as Soft Typing 0 is to type checking (see 
Figure Q): the aim is to retain the flexibility of dynamic scheduling whilst using 
static analysis to remove as many dynamic checks as possible. To highlight this 
analogy we choose to call our method Soft Scheduling. 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 57-|T51 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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Typing 


Scheduling 


Static 


No dynamic checks required in ob- 
ject code. 

Not all valid programs pass type 
checker. 


No scheduling logic required in fi- 
nal circuit. 

Not all valid programs can be 
scheduled statically. 


Dynamic 


Dynamic checking of argument 
types required each time a function 
is called. 

All valid programs can be run. 


Scheduling logic required on each 
shared resource in the final circuit. 
All valid programs can be sched- 
uled. 


Soft 


Fewer dynamic checks required 
(some removed statically). 

All valid programs can be run. 


Less scheduling logic required 
(some removed statically). 

All valid programs can be sched- 
uled. 



Fig. 1. An Informal Comparison between Soft Scheduling and Soft Typing. 



Although this paper considers the application of Soft Scheduling to hardware 
synthesis, the technique is also applicable to software compilation. Aldrich et 
al. advocate a similar approach which uses static analysis to remove redundant 
synchronization from Java programs. 



1.1 Conventional High-Level Synthesis 

The hardware community refer to high-level, block-structured languages as be- 
havioural. At a lower level, structural languages describe a circuit as a set of 
components, such as registers and multiplexers connected with wires and buses 
(e.g. RTL Verilog 0). High-level synthesis (sometimes referred to as behavioural 
synthesis) is the process of compiling a behavioural specification into a structural 
hardware description language. 

A number of behavioural synthesis systems have been developed for popular 
high-level languages (e.g. CtoV |2Dj and Handel Id)- Such systems typically 
translate high-level specifications into an explicitly parallel flow-graph represen- 
tation where allocation, binding and scheduling jH] are performed: 

— Allocation is typically driven by user-supplied directives and involves choos- 
ing which resources will appear in the final circuit (e.g. 3 adders, 2 multipliers 
and an ALU). 

— Binding is the process of assigning operations in the high-level specification 
to low-level resources — e.g. the + in line 4 of the source program will be 
computed by adder_l whereas the + in line 10 will be computed by the ALU. 

— Scheduling involves assigning start times to operations in the flow-graph such 
that no two operations will attempt to access a shared resource simultane- 
ously. Mutually-exclusive access to shared resources is ensured by statically 
serialising operations during scheduling. 
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The Contributions of this Paper 

1. In contrast to conventional scheduling, we describe a method which gener- 
ates logic to schedule shared resources dynamically. We show that (i) our 
approach is more expressive: all valid programs can be scheduled; and (ii) in 
some cases, our approach can generate more efficient designs by exploiting 
parallelism possibilities removed by static scheduling. 

2. We describe a high-level static analysis that enables us to remove redundant 
scheduling logic and show that this can significantly improve the efficiency 
of generated circuits. 

We have implemented Soft Scheduling as part of the FLaSH Synthesis System 
(see Section OD— a novel hardware synthesis package being developed in con- 
junction with Cambridge University and AT&T Laboratories Cambridge. This 
paper presents Soft Scheduling in the framework of the FLaSH silicon compiler. 

1.2 The FLaSH Synthesis System 

In previous work we introduced a hardware description language, SAFL m 
(Statically Allocated Functional Language), and sketched its translation to hard- 
ware. An optimising silicon compiler (called FLaSH m — Functional Languages 
for Synthesising Hardware) has been implemented to translate SAFL into hi- 
erarchical RTL Verilog. The system has been tested on a number of designs, 
including a small commercial processoiQ. 

Although, for expository purposes, this paper describes our method in the 
framework of SAFL, Soft Scheduling techniques are applicable to any high-level 
Hardware Description Language which allows function definitions to be treated 
as shared resources (e.g. HardwareC m Balsa P], Tangram |2]). Indeed we 
have extended SAFL with 7r-calculus HH style channels and assignment without 
modifying the Soft Scheduling phase H2|. 



Outline of Paper. The remainder of this paper is structured as follows: Sec- 
tion Elsurveys existing scheduling techniques and explains the motivation for our 
research; the SAFL language and its translation to hardware are briefly outlined 
in Section 01 Section 0 presents the technical details of Soft Scheduling; some 
practical examples are described in Section 0 

2 Comparison with Other Work 

Traditional high-level synthesis packages perform scheduling using a data-struc- 
ture called a sequencing graph — a partial ordering which makes dependencies 

^ The instruction set of Cambridge Consultants XAP processor was implemented 
(see http://www.camcon.co.ukl. We did not include the SIF instruction (a form 
of debugging breakpoint which transfers data to or from internal registers via a 
separately clocked serial interface). 
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between operations explicit. Recall that, in this context, scheduling is performed 
by assigning a start time to each operation in the graph such that operations 
which invoke a shared resource do not occur in parallel 0 . There are a number 
of problems with this approach: 

1. The time taken to execute each operation in the sequencing graph must be 
bounded statically (and in general padded to this length). This restriction 
means that conventional scheduling techniques are not expressive enough to 
handle a large class of practical designs. For example, it is impossible to 
statically schedule an operation to perform a bus transaction of unknown 
length. 

2. Since operations are scheduled statically one must be pessimistic about what 
may be invoked in parallel in order to achieve safety. This can inhibit par- 
allelism in the final design by unnecessarily serialising operations. 

Ku and De Micheli have proposed Relative Scheduling m which extends the 
method outlined above to handle operations with statically unbounded compu- 
tation times. Their technique partitions a flow-graph into statically-schedulable 
segments separated by anchor nodes — nodes which have unbounded execution 
delays. Each segment is scheduled separately at compile-time. Finally, the com- 
piler connects segments together by generating logic to signal the completion of 
anchor nodes dynamically. 

In [E] Ku and De Micheli show how Relative Scheduling of shared resources is 
integrated into their Olympus Hardware Synthesis System ^ . Their method per- 
mits the scheduling of operations whose execution time is not statically bounded, 
hence alleviating Problem 1 (above). However, potential contention for shared 
resources is still resolved by serialising operations at compile time so Problem 2 
remains. Furthermore, there is still a class of practical designs which cannot be 
scheduled by Olympus. Consider the following example. 

Using I I as a parallel composition operator and assuming suitable definitions 
of procedures Processor, DMA_Controller and Memory we would essentially like 
to describe the system of Figure 0 as: 

Processor 0 I I DMA_Controller () I I Memory () 

Since the operations corresponding to the invocation of the Processor and 
DMA_Controller both access a shared resource (Memory) the Olympus Synthesis 
System requires that the calls must be serialised. Howeven if neither the call 
to Processor 0 nor the call to DMA_Controller terminatqj, attempting to se- 
quentialise the operations is futile; the correct operation of the system relies on 
their parallel interleaving. Soft Scheduling is expressive enough to cope with non- 
terminating operations: the FLaSH compiler automatically generates an arbiter 
to ensure mutually exclusive access to the Memory whilst allowing the Processor 
and DMA_Controller to operate in parallel (see Figure |21ii). The following table 
summarises the expressivity of various scheduling methods. 

^ This is not merely a contrived example. In real designs both Processors and DMA 
Controllers are typically non-terminating processes which constantly update the ma- 
chine state. 
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Fig. 2. A Hardware Design Containing a Memory Device Shared between a 
DMA Controller and a Processor. 





Bounded execution 
delays 


Unbounded execution 
delays 


Non-terminating 

operations 


Static 


/ 






Relative 


/ 


/ 


)<■ 


Soft 


/ 


/ 


/ 



Although the technique of using arbiters to protect shared resources is widely 
employed, current hardware synthesis packages require arbitration to be coded 
at the structural level on an ad hoc basis. Since arbitration can impose an over- 
head both in terms of chip area and time, programmers often try to eliminate 
unnecessary locking operations manually. For large designs this is a tedious and 
error-prone task which often results in badly structured and less reusable code. 
In contrast the Soft Scheduling approach analyses a behavioural specification, au- 
tomatically inserting arbiters on a where-needed basis. This facilitates readable 
and maintainable source code without sacrificing efficiency. 

This paper does not discuss the SAFL language in depth. A detailed com- 
parison of SAFL with other hardware description languages including Verilog, 
VHDL, MuFP, Lava, ELLA and Lustre can be found in m- 

3 An Overview of the SAFL Language 

SAFL is a language of first order recurrence equations with an ML |I3] style 
syntax. A user program consists of a sequence of function definitions: 

fun fi{x) = ei; . . . ; fun /„(£) = e„ 

Programs have a distinguished function, main, (usually /„) which represents an 
external world interface — at the hardware level it accepts values on an input 
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port and may later produce a value on an output port. The abstract syntax of 
SAFL expressions, e, is as follows (we abbreviate tuples (ei,...,efe) as e and 
similarly {x\, . . . ,Xk) as x): 

— variables: x; constants: c; 

— user function calls: /(e); 

— primitive function calls: a(e) — where a ranges over primitive operators (e.g. 

+, <=, && etc.); 

— conditionals: ei ? 62 : 63; and 

— let bindings: let x = e in Cq end 

In order to distinguish distinct call sites we assume that each abstract-syntax 
node is labelled with a unique identifier, a, writing /“(ei, . . . , Ck) to indicate a 
call to function / at abstract-syntax node a. 

Although functions can call other previously defined functions arbitrarily, the 
only form of recursion allowed is tail-recursion. This allows us to statically allo- 
cate the storage (e.g. registers and memories) required by a SAFL program m 
Tail recursive calls are compiled into feedback loops at the circuit level. 

SAFL is a call-by-value language. All function-call arguments and let-defini- 
tiens are evaluated in parallel. Operations can be sequenced using the let con- 
struct since the language semantics state that all let-declarations must termi- 
nate before the let-body is evaluated. 

We compile SAFL to hardware in a resource aware manner. That is each 
function definition is mapped into a single hardware-level resource', functions 
which are called more than once become shared resources. For example, consider 
the following SAFL code: 

fun multCx, y, acc) = 

if (x=0 or y=0) then acc 

else mult(x<<l, y>>l, if y.bitO then acc+x else acc) 

fun square (x) = mult(x, x, 0) 

fun cube(x) = mult(x, mult(x, x, 0), 0) 

This SAFL specification describes a circuit containing a single shift-add multi- 
plier shared between hardware-blocks to compute squares and cubes. Notice how 
in contrast to traditional high-level synthesis (see Section ITTIl the resource aware 
interpretation of SAFL specifications explicitly contains allocation and binding 
information. (Although not of direct relevance to this paper, in jl b| we show 
how fold/unfold transformations P| can be used to explore various allocation 
and binding constraints.) 

3.1 Translating SAFL to Hardware 

As in Relative Scheduling m the FLaSH compiler generates logic to explicitly 
signal the completion of operations. More precisely, each SAFL function defini- 
tion, /, is compiled into a single resource, Hf, consisting of: 
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— logic to compute its body expression 

— multiple control and data inputs: one control/data input-pair for each call 

site 

— multiple control outputs (one to return control to each caller) 

— a single data output (which is shared between all callers) 

An example of function connectivity is given in Figure 0 In this example 
resource Hf is shared between Hg and Hh- Notice how Hf’s data output is 
shared, but the control structure is duplicated on a per call basis. 

To perform a call to resource H f the caller places the argument values on its 
data input into Hf before triggering a call event on the corresponding control 
input. Some point later, when Hf has finished computing, the result of the 
call is placed on Hf’s shared data-output and an event is generated on the 
corresponding control output. Full details of the translation to hardware can be 
found in HB|. 

4 Soft Scheduling: Technical Details 

To protect shared resources the FLaSH compiler automatically generates schedul- 
ing logic to resolve conflicts dynamically (see FigureEj). The scheduling circuitry 
consists of two parts: (i) an arbiter to select which caller to service; and (ii) a 
locking mechanism to ensure the resource is accessed mutually exclusively. For 
the sake of brevity, this paper uses the term arbiter to refer to both the arbiter 
and locking structure. 

Our approach is the hardware equivalent of using binary semaphores to pro- 
tect critical regions in multi-threaded software. The analogy between arbiters 
and semaphores is explored further in HS| where a compilation function from 
SAFL to software is presented. 



4.1 Removing Redundant Arbiters 

Just because a resource is shared does not necessarily mean that arbitration is 
required. For example consider the following SAFL program: 

fun f (x) = . . . 
fun g(x) = f (f (x) ) 

In this case, the two calls to f cannot occur in parallel: the innermost call must 
complete before the outermost call can begin (recall that SAFL is a call-by-value 
language). We do not need to generate an arbiter to serialise the calls to Ht. 
from the structure of the program we can statically determine that the two calls 
will not try to access f simultaneously. 

We use Parallel Conflict Analysis (see Section ^21 in order to detect redun- 
dant arbiters. Removing unnecessary arbitration is important for two reasons: 



64 



Richard Sharp and Alan Mycroft 




Fig. 3. A structural diagram of the hardware circuit corresponding to a shared 
function, /, called by functions g and h. Data buses are shown as thick lines, 
control wires as thin lines. 



1. Arbitration takes time: in the current version of the FLaSH compiler ar- 
bitration adds one cycle latency to a call even if the requested resource is 
available at the time of call. Although we may accept this latency if it is 
small in comparison to the callee’s average execution time, consider the case 
where the callee is a frequently used resource with a small execution delay. In 
this case an arbiter may significantly degrade the performance of the whole 
system (see Example Ih. HI . 

2. Arbitration uses chip area: although the gate-count of an arbiter is typically 
small compared to the resource as a whole, the extra wiring complexity 
required to represent request and grant signals adds to the area of the final 
design. 

Arbiters are inserted at the granularity of calls. This offers increased perfor- 
mance over inserting arbiters on a per-resource basis. For example, in a design 
containing a function, /, shared between five callers, we may infer that only two 
calls to / require an arbiter — the other three calls need not suffer the overhead 
of arbitration. 



4.2 Parallel Conflict Analysis (PCA) 

Parallel Conflict Analysis (PCA) is performed over the structure of a whole 
SAFE program in order to determine which function calls may occur in parallel. 
If a group of calls to the same function may occur in parallel then we say that 



Soft Scheduling for Hardware 



65 



the group is conflicting. We only need to synthesise logic to arbitrate between 
conflicting calls since, by deflnition, if a call /“ is not in a conflicting group then 
no other call to / can occur in parallel with /“. 

The result of PCA is a conflict set: a set of calls which require arbiters. 
For example, if the resulting conflict set is {/^, /^, /®, 5^^} then we would 

synthesise two arbiters: one for the conflicting group {/^,/^,/^}, the other for 
conflicting group 

We now proceed to define PCA. Let e/ represent the body of function /. 
Let the predicate Recur siveCall{f°‘) hold iff /“ is a recursive call (i.e. /“ occurs 
within the body of /). C|e] returns the set of non-recursive calls which may 
occur as a result of evaluating expression e: 



CH = 0 

CH = 0 

C|a(ei,...,efe)l = |J C|ei] 

l<i<k 

C[r(ei,...,efe)l = ( U C[e.] )U 

l<i<k 

C|if ei then 62 else 63] = U 

l<z<3 

C|let x = e in Co] = [J C|ei] 

0<z<fc 



0 if RecursiveCall{f°‘) 

{/“} U C|e/] otherwise 



PC{Si, . . . ,Sn) takes sets of calls, (5i, . . . ,5„), and returns the conflict set re- 
sulting from the assumption that calls in each Si are evaluated in parallel with 
calls in each Sj {j ^ i): 



PC'(5i, . . . , 5„) = \J{r e 5, I 3(3. f e 5,} 



We are now able to define A|e] which returns the conflict set due to expression e: 

Alxj = 0 

A|c] = 0 

A|a(ei,...,efc)l = PC'(C|ei],...,C|efcl)U |J A|ei] 

i<i<k 

A[/(ei,...,efc)l = PC'(C|ei],...,C|efcl)U |J Ale^] 

i<i<k 



Al±f ei then 62 else 63] = U 

l<z<3 

A|let f = e in eo] = ^(^(Clei], . . . ,C|efcl) U (J A\ei} 

0<i<k 

Finally, for a program, p, consisting of a sequence of user-function definitions: 
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fun /i(. . .) = ei; . . . ; fun /„(. . .) = e„ 

A\p\ returns the conflict set resulting from program, p. The letter A is used 
since A\p\ represents the calls which require arbiters: 

= U 

l<fc<n 

Notice that the equation for C|/“(ei, . . . , Cfc)] is a little unusual in that it 
is not defined compositionally. This reflects the fact that PCA depends on the 
global structure of a whole SAFL program as opposed to just the local structure 
of a function definition. C| • ] is well-defined due to the predicate Recursive Call 
and the source restrictions on SAFL which ensure that the call-graph is acyclic. 

4.3 Integrating PCA into the FLaSH Synthesis System 

After computing A\p\ at the abstract-syntax level the FLaSH Synthesis System 
translates p into an intermediate flow-graph representation which makes both 
control and data paths explicit CHI. At this level, the Call-nodes which require 
arbitration are tagged (i.e. we tag node, n, iff n represents Call /“ and /“ G 

When the circuit for Hf is generated only tagged calls to / are fed through 
an arbiter, other calls are merely multiplexed. If none of the calls to / are in A\p\ 
then Hfs arbiter is eliminated completely. As Section lOl shows, using Parallel 
Conflict Analysis to remove redundant arbitration can significantly improve the 
performance of a large class of designs. 

4.4 Avoiding Deadlock 

Deadlock occurs when there is a cycle of blocked processes each waiting for a lock 
held by the next process in the cycle. In the context of SAFL, where functions 
represent hardware-level resources, a deadlocked cycle of resources can only occur 
if we permit cycles in the call-graph (i.e. if we permit mutual recursion). Note 
that we do not have to worry about self-tail-recursion since it is simply treated 
as local loops and does not require locks. 

Although the details are beyond the scope of this paper, in we show 
how to deal with mutual recursion whilst avoiding deadlock. For the purposes 
of this paper it suflices to say that deadlock can be avoided simply by rejecting 
mutually recursive SAFL programs. 



5 Examples and Results 

We provide three practical examples of applying Soft Scheduling to SAFL hard- 
ware designs. Each example illustrates a different point: Example 16 . II demon- 
strates that using static analysis to remove redundant arbiters is critical to 
achieving efficient circuits; Example ^3 highlights the extra expressivity of Soft 
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Scheduling over static scheduling techniques; Example OI shows that dynami- 
cally controlling access to shared resources can lead to better performance than 
generating a single schedule statically. 

5.1 Parallel FIR Filter 

Finite Impulse Response (FIR) filters are commonly used in Digital Signal 
Processing where they are used to remove certain frequencies from a discrete- 
time sampled signal. Assuming the existence of functions read_next_value and 
write_value, an integer arithmetic FIR filter can be described in SAFE as fol- 
lows: 

fun multi (x,y) = x*y 

fun mult2(x,y) = x*y 

fun FIR(x,y,z,w) = 

let val ol = multi (x, 2) 
val o2 = mult2(y,3) 
val next = read_next_value () 
in 

let val o3 = multi (x, 7) 
val o4 = mult2(y,9) 
in write_value (ol + o2 + o3 + o4) ; 

FIR(y,z,w,next) 

end 

end 

Recall that the semantics of the let statement requires all val-declarations 
to be computed fully before the body is executed (see Section [SJ . Although 
this design contains two shared combinatorial multipliers, multi and mult2, the 
outermost let statement ensures that the calls to the shared multipliers do not 
occur in parallel. As a result Parallel Conflict Analysis infers that no arbitration 
is required. 

The shared combinatorial multipliers, multi and mult 2 take a single cycle to 
compute their result. Generating an arbiter for a shared resource adds an extra 
cycle latency to each call (irrespective of whether the resource is busy at the 
time of call). Thus, in this case, if we naively generated arbiters for all shared 
resources, the performance of the multipliers would be degraded by a factor of 
two. 

This example illustrates the importance of using static analysis to remove 
redundant arbiters. For this design, using Parallel Conflict Analysis to remove 
unnecessary arbiters leads to a 50% speed increase over a policy which simply 
inserts arbiters on each shared resource. 

5.2 Shared-Memory Multi-processor Architecture 

Figure El contains SAFE code fragments describing a simple shared-memory 
multi-processor architecture. The system consists of two processors which have 
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type Instruction = {opcode : 4, operand: 121- 
const WRITE=1, READ=0 

extern Shared_memory (WriteSelect : 1 , Address: 12, Data: 16) : 16 

extern instruction_meml (Address : 12) : 16 

extern instruction_mem2 (Address : 12) : 16 

(* Processor 1: Loads instructions from instruction_meml *) 
fun procl(PC:12, RX:16, RY:16, A: 16) : unit = 

let val instr: Instruction = instruction_meml (PC) 
val incremented_PC = PC -H 2 
in 

case instr. opcode of 

1 => (* Load Accumulator From Register *) 
if instr . operand=l 

then prod (increment ed_PC , RX , RY , RX) 
else procK increment ed_PC , RX , RY , RY) 

I 2 => (* Load Accumulator From Memory *) 

let val V = Shared_memory(READ, instr . operand, 0) 
in procl(incremented_PC,RX,RY,v) 
end 

I 3 => (* Store Accumulator To Memory *) 

(Shared_memory (WRITE, instr . operand. A); 
prod (increment ed_PC,RX,RY,v) ) 

. . . etc 

end 

(* Processor 2: Loads instructions from instruction_mem2 *) 
fun proc2(PC:12, RX:16, RY:16, A: 16) : unit = 

let val instr : Instruction = instruction_mem2(PC) 
val incremented_PC = PC -H 2 
in 

case instr. opcode of 

I 2 => (* Load Accumulator From Memory *) 

let val V = Shared_memory(READ, instr . operand, 0) 
in proc2(incremented_PC,RX,RY, v) 
end 

I 3 => (* Store Accumulator To Memory *) 

(Shared_memory (WRITE, instr . operand. A); 
proc2(incremented_PC,RX,RY,v) ) 

... etc 

end 

fun mainO : unit = procKO, 0,0,0) I I proc2(0, 0,0,0) 

Fig. 4. Extracts from a SAFE Program Describing a Shared-Memory Multi- 
processor Architecture. 
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separate instruction memories but share a data memory. Such architectures are 
common in control-dominated embedded systems where multiple heterogenous 
processors perform separate tasks using a common memory to synchronise on 
shared data structures. 

The example starts by defining the type of instructions (records contain- 
ing 4-bit opcodes and 12-bit operands), declaring 2 constants and specifying 
the signatures of various (externally defined) memory functions. Bit-widths are 
specified explicitly using notation X ; n to indicate that variable X represents an 
n-bit value. The bit- widths of function return values are also specified in this 
way (unit indicates a width of 0). 

The function Sharedjnemory takes three arguments: WriteSelect indicates 
whether a read or a write is to be performed; Address specifies the memory 
location concerned; Data gives the value to be written (this argument is ignored 
if a read operation is performed) . It always returns the value of memory location 
Address. 

Functions prod and proc2 define two simple 16-bit processors. Argument 
PC represents the program counter, RX and RY represent processor registers and 
A is the accumulator. The processor state is updated on recursive calls — neither 
processor terminates. 

The main function initialises the system by calling prod and proc2 in par- 
allel with PC, RX, RY and A initialised to 0. 

Since the SAFL code contains parallel non-terminating calls to prod and 
proc2 both of which share a single resource, neither static nor relative schedul- 
ing are applicable (see Section El): this example cannot be synthesised using 
conventional silicon compilers. 

Soft Scheduling is expressive enough to deal with non-terminating resources: 
a circuit is synthesised which contains an arbiter protecting the shared memory 
whilst allowing prod and proc2 to operate in parallel. 

5.3 Parallel Tasks Sharing Graphical Display 

Consider a hardware design which can perform a number of tasks in parallel 
with each task having the facility to update a graphical display. Many real-life 
systems have this structure. For example in preparation for printing an ink-jet 
printer performs a number of tasks in parallel: feed paper, reset position of print 
head, check ink levels etc. Each one of these tasks can fail in which case an error 
code is printed on the graphical display. 

A controller for such a printer in SAFL may have the following structure: 

extern display (data : 16) : unit 

fun reset_head() : unit = ... 

if head_status <> 0 then 

display (4) (* Error code 4 *) 
else . . . 
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fun feed_paper() : unit = ... display(5) 
fun check_ink() : unit = ... display (6) 



: unit = 

I I reset_head() 



fun mainO 

(f eed_paper 0 
do_print () ; 

wait_f or_next_job() ; mainO 



I I check_ink() ) ; 



Let us assume that each of the tasks terminates in a statically bounded time. 
Given this assumption, both static scheduling and Soft Scheduling can be used 
to ensure mutually exclusive access to display. It is interesting to compare and 
contrast the circuits resulting from the application of these different techniques. 

Since the tasks may invoke a common resource, applying static scheduling 
techniques results in the tasks being serialised. In contrast. Soft Scheduling allows 
the tasks to operate in parallel and automatically generates an arbiter which 
dynamically schedules access to the shared display function. 

Errors occur infrequently and hence contention for the display is rare. Under 
this condition, and assuming that the tasks take all roughly the same amount 
of time. Soft Scheduling yields a printer whose initialisation time is three times 
faster than an equivalent statically scheduled printer. More generally, for a sys- 
tem with n balanced tasks. Soft Scheduling generates designs which are n times 
faster. 



6 Conclusions and Further Work 



Soft Scheduling is a powerful technique which provides a number of advantages 
over current scheduling technology: 

More expressive: in contrast to existing scheduling methods. Soft Scheduling 
can handle arbitrary networks of shared inter-dependent resources. 
Increased efficiency: in some circumstances, controlling access to shared re- 
sources dynamically yields significantly better performance than statically 
choosing a single schedule (see Example 
Higher level of abstraction: current hardware synthesis paradigms require a 
designer to code arbiters explicitly at the structural level. Soft Scheduling 
abstracts mutual exclusion concerns completely, increasing the readability 
of source code without sacrificing efficiency. 

One of the aims of the FLaSH Synthesis System is to facilitate the use of 
source-level program transformation in order to investigate a range of possi- 
ble designs arising from a single specification. We have shown that fold/unfold 
transformations P| can be applied to SAFE programs to explore various allo- 
cation/binding constraints [ I bj . In PI we describe a SAFE transformation to 
partition a design into hardware and software parts. The simplicity of our trans- 
formation system is partly due to the resource abstraction provided by Soft 
Scheduling — transformations involving shared resources would be much more 
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complex if locking and arbitration details had to be considered at the SAFL- 
level. 

An arguable disadvantage of dynamic scheduling is that is makes the timing 
behaviour of the final circuit difficult to analyse. Since access to shared resources 
is resolved dynamically it becomes much harder to prove that real-time design 
constraints are met. In future work we intend to investigate (a) the incorporation 
of timing directives into the SAFL language; and (&) the static analysis of timing 
properties for dynamically scheduled hardware systems. 

When the parallel interleaving of non-terminating resources is required dy- 
namic scheduling is essential (see Example 15.211 ; in other cases dynamic schedul- 
ing can offer increased performance (see Exa,mnle l5.,Sll . However, for fine-grained 
sharing of smaller resources whose execution delays are known at compile-time 
(such as arithmetic units), static scheduling techniques are more appropriate. 
Soft Scheduling provides a powerful framework which strikes a compromise be- 
tween the two approaches. The designer has the flexibility either: 

— to describe a single static schedule (see Example 15. 1 II in which case dynamic 
arbitration is optimised away; or 

— to leave scheduling details to the compiler (see Example 15.511 in which case 
dynamic arbitration is inserted where needed. 
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Abstract. We introduce a constraint-based framework for strictness 
analysis applicable to ML style languages supporting higher-order func- 
tions, let-style polymorphism and algebraic data types. The analysis pro- 
duces strictness types for expressions in a program. A strictness type 
is defined using Boolean constraints. Perhaps surprisingly, the Boolean 
constraints that arise during analysis are in HORN, which makes the 
operations on them amenable to efficient implementation. 

We have implemented the approach within a highly optimising Haskell 
compiler (GHC) and give a comparison with the current strictness anal- 
yser of GHC. 



1 Introduction 

In lazy functional programming languages expressions are evaluated only if they 
are needed. A function is strict in an argument if its value is undefined whenever 
the argument is undefined. Determining which arguments of a function are strict 
can be used by compilers to improve program performance by replacing call-by- 
need by call-by value. Hence there has been considerable interest in determining 
strictness of functions. 

Mycroft |H| was the first person to consider strictness analysis of (first-order, 
flat- typed) functional programs. His approach to strictness analysis using ab- 
stract interpretation maps a function to a strictness function over Booleans. 
The undefined value is represented by false (which we shall write as _L) while 
no information (all possible values including the undefined value) is represented 
by true (written T). The translation to a Boolean function preserves just the 
function’s behaviour on undefined values. The result of evaluating this function 
gives a safe approximation of the real function’s behaviour. 

For example, the strictness type of -I-, which is strict in both its arguments, 
is represented by the function 

-p X y = X A y 

where A is Boolean conjunction. 

The strictness of -I- in its first argument is shown by the fact that -|- T T = 
T. This means that if the first argument is undefined the result of -I- must be 
undefined. 

The approach of Mycroft is pleasing, in that it simply maps each program 
construct to a Boolean version. So, for example, the functions 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 73-E21 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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f :: Int Int Int i— > Bool Int 
fxyzb = ifb then x + y else z 
g :: Int Int i~> Bool Int 
gxyb =fxyxb 

are represented by the Boolean strictness functions 

f : Bool I— > Bool I— > Bool Bool i— > Bool 

^xyzb = bf\ ((+ X y)\/ z) = b A ((x A y) V z) 

g : Bool 1 -^ Bool I— > Bool e- > Bool 

gxyb =fxyxb — bA ((a; Ay) V x) = b Ax 

where V is Boolean disjunction. Note that we use the notation for function 
types to differentiate them from Boolean implication — 

In general, a function f is detected to be strict in an argument if f called with 
that argument _L and all other arguments T returns _L. For example, f is strict 
in b but not strict in x since f T T T _L = _L and f _L T T T = T 

The natural way to extend Mycroft’s analysis to higher-order functions is to 
represent their strictness by higher-order Boolean functions, and this is indeed 
what Burn, Hankin and Abramsky |H defined. Unfortunately, in practice manip- 
ulating higher-order Boolean function descriptions quickly becomes impractical. 

Kuo and Mishra Q instead suggested using constrained type expressions to 
express strictness. We take a similar approach in our framework, the strictness 
of a function is represented by a Boolean relation, rather than a function. The 
relation is represented by a two part description (C, r), C is a Boolean Constraint 
and T is the type annotated with the names of the Boolean variables. These 
variables, relate the Boolean constraints to the arguments and result of the 
function. We also write this in the style of a constrained type VAC r where S 
is a set of free variables, not bound in the function’s environment, r is the type 
component and C is the constraint component, restricting the set of possible 
instances of variables (5. 

For example, we represent f above as 

f :y b X y z r. (5 A {{x Ay)\/z)*^r^x^y^z^b^r 

Once we have made the step to relational descriptions, a key realization is 
that only one direction of the constraint information is required to determine 
the strictness information. Since T indicates no information the statement that 
f T T 1 T = T conveys no information. It is the fact that f T T 1 T = T 
which demonstrates that / is strict in b. This information is still captured in the 
weaker relational description 

f :y b X y z r. (5 A {{x Ay)yz)^r^x^y^z^b^r 

There are immediate advantages to keeping only the minimal amount of infor- 
mation: 
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— The resulting Boolean constraints are in HORN. For example, {bA{{xAy)\/ 
z) ^ r is equivalent to {b ^ x ^ y ^ r) A {b —> z ^ r)u Algorithms for 
manipulating Horn formulae are more efficient than those for manipulating 
arbitrary Boolean formulae. 

— The handling of if-then-else (and other “branching” constructs like case) 
is considerably simpler, the resulting strictness constraints can simply be 
conjoined. For example, consider the inference of a strictness type for f: 

X + y : (a; ^ y ^ r, r) 
z : (z ^ r, r) 

if b then x + y else z : {b ^ {{x ^ y ^ r) A {z ^ r)), r) 

f :\/ b X y z r. {b ^ x ^ y ^ r) A {b ^ z ^ r) ^ 
a:i— i-r 

where for each subexpression we have written its strictness description on the 
right-hand side. 

For functions with a first-order, flat type it is easy to show that our inference 
algorithms gives the same answers as that of Mycroft 0 and Burn et al. P (see 
Theorems P 0 and . 

Type-based strictness analyses, such as the approach of Kuo and Mishra and 
ours, extend to higher-order functions in a natural way. Descriptions just relate 
the Boolean variables representing the various parts of the type. For example, 
the strictness types of twice f x = f (f x), id x = x and c3 x = 3 are 

twice : V(5i(52(53<54. <53 — > <5i A 1^2 — > <5i A <52 ^ <^4 => (<5i <52) > <5s > <54 

id : V5 i<52. <5i ^ <52 => <5i S 2 
c3 : V5 i<52. <52 => <5i 62 

and the inference of strictness types for expressions twice id and twice c3 are 
straightforwardly obtained by matching arguments and conjoining the resulting 
constraints. That is 

twice id : (< 5 i <52 A <53 — > < 5 i A <52 ^ < 5 i A <52 — > < 54 , < 5 s <54) = (<53 ^ < 54 , <53 <54) 

twice c3 : (^2 A < 5 s ^ < 5 i A <52 ^ A <52 ^ <54, ^3 ^4) = (<54, ^3 <54) 

Using Boolean constraints for strictness descriptions is advantageous since 
they are well understood, and operations like conjunction and existential quan- 
tification (both used in the above examples) are straightforward and have effi- 
cient implementations. Another advantage we claim for using well-understood 
constraint domains is that extensions of the system are easier to construct. Our 
approach to strictness analysis handles both let-style polymorphism and alge- 
braic data types. 

In this paper we present a strictness analyser for Haskell based on HORN 
constraints. It has been incorporated into the GHC Haskell compiler and handles 
the full Haskell 98 language. Although the implementation is still preliminary it 
is capable of analysing much more precise strictness information than the current 
GHC strictness analyser. 

^ We shall assume for the remainder of the paper that ^ is right associative and 
higher precedence than A and usually write Boolean constraints without brackets 
e.g. b^x—>y—^rAb—>z^r. 
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The remainder of the paper is organized as follows. In the next section we 
define our strictness logic and its relation to the abstract interpretation frame- 
work of Burn et al. In particular, we prove equivalence on first-order, flat 
types to Q . In Section 0 we give the inference algorithm for monomorphic pro- 
grams. In Section 0 we extend the framework to handle let-style polymorphism 
and algebraic data types. In Section 0 we discuss our implementation in the 
GHC compiler and compare it with the current strictness analysis. Finally, in 
Sections we discuss related work and conclude. 

2 Strictness Logic 

For the moment, we assume that the underlying language is monomorphic: 
Types t ::= Int | Bool | 1 1 — > t 

Expressions e ::= x \ \x.e | e e | let a: = e in e | fix a; in e | if e then e else e 

We will in later sections show how to incorporate ML-style polymorphism and 
algebraic data types. 

We define a strictness logic in terms of a constraint-based type system. Strict- 
ness properties of expressions are specified by typing judgments which are of the 
form C, F h e : rj where (7 is a strictness constraint, F is a strictness type envi- 
ronment assigning strictness types to the free variables of e and ry is e’s strictness 
type. We always assume that expression e is well- typed in the underlying type 
system. 

Strictness Types 

The structure of strictness types reflects the structure of the underlying type 
system. 

Annotations 6 ::= (5 | T | T 
Strictness Types t y.= h \ t ^ t 

Type Schemes ry ::= r | V5.C => r 

Our strictness analysis does not distinguish divergent functional expressions 
from functions that always diverge when applied to an argument. In rare cases 
this loses precision. If an analysis wishes to distinguish the two then a variable 
would also be attached to the type constructor. For example, this is required 
in our binding-time analysis |3|. 

We write 5 = fv{rj) to refer to the free variables in a strictness type rj. The 
constraint component C will be described shortly. 

Strictness Constraints 

Strictness constraints are in the class HORN of propositional formulas, addi- 
tionally, we have conjunction and existential quantification. The language of 
constraints is as follows: 

C ::= T I T I (5 I (5i ^ . . . ^ I C A C I 36. C 
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(Var) 



{x : ri) e r 
C,r \- x-.rj 



(Sub) 



C,rhe:r2 C N [(^2 <. n)] 
C, r h e : n 



/ A , N : Ti I- e : T2 

(Abs) 

C, Fx h Xx.e : Ti T2 

C,Fx ei \ rj 

(Let) C,Fx.x : rj h 62 : r 

C, Fx let a; = ei in 62 : t 

C AD,F \- e-T 
(VI) S = fv{D,r)\fv{F,C) 
CA 3 S.D,F h e-.'iS.D^T 

C, r h e : 5 

C, E h ei : Ti ( 7 , E h 62 : T2 
C \= 8 ^ |(ri <s r) A (t2 <3 t)] 
C,F h if e then ei else 62 : r 



C,Fhei: (n T2) 
(App) C, E I- 62 : ri 

C, E h 6162 : T2 
C,F h e:r 
( 31 ) S^fv{F,r) 

3 S.C, F h e-.T 

C,F ^ X-. yS.D ^ r 
(VE) C ^ [b/S]D 

C,F \- X-. [b/S]T 

, C,Fx.x : r; h 6 : r; 

(Fix) 

C, Fx \- fix 2: in 6 : 77 



Fig. 1. Strictness Typing Rules. 



where _L denotes false and T denotes true. We write C ^ Z? to denote model- 
theoretic entailment among constraints C and D. 

Structural strictness constraints (r <g r') are translated as follows: 

Kn t[ <s T2 r^)l = |(r{ <s T^)l A |(t2 <s Ti)] 

I((5l <s < 52 )] = ^ ^2 

Note that types t and r' in structural constraints (r <s t') must always be of the 
same shape. This condition is enforced by requiring expressions to be well- typed 
in the underlying type system. 

Typing Rules 

Figure [U defines the strictness logic. Most of its rules are straightforward. F^ 
denotes the environment obtained from E by excluding variable x. Rule (If) 
handles conditional expressions. Note that well-typing in the underlying system 
ensures that expression e is of type Bool and expressions ei and 62 share the 
same underlying type. Therefore, their strictness types are of the same shape. 
The rule states that if the condition e is undefined, i.e. C \= 5 ^ F, then the 
resulting type r is unconstrained. If we have no information about the condition, 
i.e. C ^ i 5 ^ T, then we build the least upper bound of the strictness descriptions 
of Cl and 62. Note that the constraint 8 — > |(ri <g r) A (t2 <s t)] is in HORN. 
Rules (VI) and (VE) handle construction and elimination of polyvariant strictness 
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types. Note that for now we only allow for polymorphism in the annotation 
variables (poly variance) . In Section ^ we show how to allow for polymorphism 
in the underlying language. Rule (31) allows unnecessary strictness type variables 
in the constraint component of a typing judgment to be hidden. 



Relationship to Abstract Interpretation 

We relate our strictness logic to the strictness analysis by abstract interpretation 
of Burn et al. PJ . We show that for first-order programs our strictness logic is 
equivalent and for higher-order programs or logic is sound, which establishes the 
correctness of our analysis. 

Strictness analysis by abstract interpretation interprets well-typed expres- 
sions in the complete partial order V where V is the least solution of the equation 

V = 2 + V 



where 2 is the two-point lattice {_L,T} such that _L < T. 

The abstraction of an expression e of type Bool i-^ Int i— > Int would then 
be a function |e] :: 2 i-^ 2 i-^ 2. The function |-] on expressions is defined as 
follows (where p is a variable mapping from variables to values): 

Mp = p(x) 

IXu.ejp = Xv.lejp[u:=v] 

[ee'lp = Ie]p(Ie']p) 

|let a; = e in e'jp = le'\p[x := |e]p] 

|fix a; in e]p = fix(Xv.\e\p[x := ■(;]) 

[if e then ei else e^lp = | ^ othelwL^ 

The meaning of strictness types is defined in terms of ideals in V. Ideals are 
non-empty, downward-closed and limit-closed subsets of V. 

The function [•] on closed strictness types is defined below. We say a type t 
is a monotype if fv{r) = 0. We let p, range over monotypes and v over T and T. 

1^1 = 

[T] ={T,T} 

Ip_^ p'l = {f €V \ v fv € [p'll 

|V(5.C ^ r] = n { 1[v/S\t\ I h [v/5\C) 

Note that [p] is an ideal for a closed strictness type rj. The strictness type 
system is sound, i.e. every satisfiable constraint has a monotype solution. The 
(<s) relation is coherent, i.e. if a type r subsumes a type r', written ^ (r <s r'), 
then the denotation of t in the ideal model is a subset of the denotation of r'. 

Our first result concerns only first-order programs. We can show that our 
HORN-based strictness system is equivalent to Burn et al.’s abstract interpre- 
tation approach for first-order programs. 

We observe that Burn et al.’s strictness descriptions for first-order functions, 
which are Boolean functions, can be viewed as constrained types in the Boolean 
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domain. For example, the function + is described by the Boolean function i5i e- > 
62 (<5i A i 52) which in turn can be viewed as V<5.(i5i A i52) <53 i52 

(5a. We can be more specific. Every first-order function / can be described by 
Vi5.P (5„ i5i . . . (5„ where P is a positive boolean formulae where the class 
of positive boolean formulas (POS) is defined as: 

P::=_L|T|(5|PAP|PVP| 35. P 

We further observe that we can weaken the strictness description of first-order 
functions without losing any information. 

Lemma 1 (Equivalence between HORN and POS Types). Let P be a 

boolean expression in POS. Let rj = \/5.P ^ i5i 1-^ . . . i-^- be a polymor- 
phic strictness type such that fv{r]) = 0. Then, P ^ Sn is in HORN and 

|V5.P i5i 1-^- . . . 1-^- (5„] = |V5.P — > i5„ (5i 1-^- . . . 1-^- 15„] 

This allows us to establish the result that HORN-based inference is sound 
and complete wrt. Burn et al.’s abstract interpretation approach for first-order 
programs. 

A variable environment p models a closed typing environment P, written 
p 1= P, if for all a; : ?7 e P, p{x) G [r;]. We write P ^ e : / iff for all p where 
p ^ P we have that |e]p G / where / is an ideal. 

Theorem 1 (Soundness and Completeness of First-Order Programs). 

Let e be a first-order program, rj a strictness type, P a strictness environment 
such that fv{P, rf) = %. Then, true, P \~ e : rj iff P \= e : L where J = [p] . 

For higher-order programs we can say that we are sound with respect to 
Burn et al.. 

Theorem 2 (Soundness of Higher-Order Programs). Let C,P \~ e:r] be 
a valid typing judgment. Let be a substitution such that fiP and fip are closed 
and 1= 4>C. Let p be a variable environment such that p |= fiP . Then |e]p G \4>p\. 



3 Inference 



We assume that we are given a well-typed program, each subexpression anno- 
tated with its underlying type. We write (e :: t) to denote that expression e has 
the underlying type t. 

Strictness inference computes the missing strictness information for a type- 
annotated expression (e :: t). We first compute the shape of e’s strictness type 
from the underlying type t. We introduce judgments of the form t \- t which 
state that t is a strictness type with the same shape as t. The rules are as follows: 



(Bool) 



5 is fresh 

Bool F 5 



(Int) 



5 is fresh 

Int h 5 



, . , p F n t2 F T2 

(Arrow) 

t\ t2 Ti T 2 
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{x : VS.C t) € r S' new 
ills' / 5 ]Cl[ 5 '/ 5 ]r) 



[x ■. t) £ r t \- t' 

(Var-A) C = [(r <„ r')] 
r,x::t (C,t') 



(Abs) 



(Let) 



r, ei :: ti 

ti L n r, 62 :: t2 (C2,T2) 

r^.x : Ti,e :: fe (C,T2) (App) tg h rg 

A, Aa:.e::tii-^t2 b (G, n T2 ) G = Gi A G 2 A |(ri <s T 2 rg)] 

r, ((ei ::ti)(62 ::t2)) ::tg (G,rg) 

A,ei :: ti bj^y (Gi,ri) 

5en(Gi,r:„,ri) = (Gq,? 7 o) G,e :: t b^^jj (G,r) 

GLi-a: : 770,62 :: tg bj^^ (G2,T2) (BIntro) 5 = fv{C)\fv{r, t) 

G = Go A G2 r,e-.-.t b^„y ( 3 AG,r) 

A, (let X = (61 :: ti) in 62) :: tg bj^y (G, T2) 



r, 6 :: Bool bj^y (Go, 5 ) t b r 
G, 61 ..t b^j^y (Gi,Ti) G, 62 .. t ^inf (G2,T2) 

^ ^ G = Go A Gi A G2 A (J ^ I(n <. r) A (rg <. r)j) 
r, (if 6 then 6i else 62 :: t) ^i^if (G, r) 

t b r 5 = fvir) 

Vo = yS.true => T 
Hr.-x:Vo,e-.-.t) = iC,T) 

A, (fix a; :: t in e) :: f b^^y (G, r) 



Fig. 2. Strictness Inference. 



In Figure |21 we define the inference algorithm which is formulated as a de- 
duction system over clauses of the form 

G,e::f b-„| (C,r) 

with a strictness environment G and a type-annotated expression e :: t as in- 
put, and a strictness constraint C and a strictness type r as output. We note 
that an algorithm in the style of W can be straightforwardly derived from this 
specification. 

All rules are syntax-directed except rule (dintro) . This rule is justified by the 
corresponding rule in the logical system and can be applied at any stage of the 
inference process. Rule (Var-A) handles lambda-bound variables and rule (Var) 
handles the instantiation of let-bound variables. Rule (Let) introduces annota- 
tion polymorphism. We define a generalisation function giving the generalised 
type scheme and the generalised constraint. Let C be a constraint, G a type 
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environment, and r a type. Then 

gen{C,r,T) = {3lC,ylC ^ r) 

where S = fv{C, r)\/w(T). Note that we push the whole constraint C into the type 
scheme (we could be more efficient by pushing in only the affected constraints) . 

In rule (If), the constraint 6 |(ti <g t) A (t 2 <s t)], where |(ri <g 
t) a (t 2 <s r)] = (i5i ^ i5() A . . . A (5„ ^ translates to i5 ^ ((5i ^ 
<5() A . . . A ((5„ ^ S'^)) which is equivalent to 6 ^ Si ^ 6 [ A ... A 6 ^ Sn ^ S'^. 
Hence, all constraints we generate are in HORN. 

With (Fix), we follow Dussart, Henglein and Mossin |2j, performing a Kleene- 
Mycroft iteration until a fixpoint is found. We define (VJi.Ci => n) < (V^ 2 -C '2 
T 2 ) iff C 2 \= 35i.((7iA|(ri <s T 2 )]). W.l.o.g. we assume there are no name clashes 
between (5i and < 52 . Define 

T(r rr.. „ : 77 i+i,e :: t) if ru < r],+i 

^ l( 0 ,r) i^Vi = V^+l 

where F^.x : r],,e :: t (C, r) and = gen{C,r^.x : Clearly, the 

sequence rjo <...< rji < ... is finite. 

Example 1. Let t = Int 1 -^ Int > Int > Int. Consider the inference for km :: t. 

km X y z = if z == 1 then x - y else km y x (z - 1) 

Let r = {x : Sx,y ■ 5y,z : Sz, km : V5i<52<53<54. true => <5i 1 — > <52 <5a 1 -^ < 54 } plus 

strictness types for 1, == and — . 

r, 1 :: Int (< 5 s,< 55 ) 

(App) r, z == 1 :: Bool hj^y (<5s ^ <5j ^ <5e A <5s, <5e) 

(dintro) r, z == 1 :: Bool hj^y (<5^ ^ <56,<5e) 

(App)(3Intro)T, z - 1 :: Int hj^y (<5^ ^ 67 , 57 ) 

(App)(3Intro)r,x - y :: Int h^^y (<5^ ^ 5y —> <58,<5s) 

(Var) T, km :: t {true, Si 1 -^ <52 1 -^ <5$ < 54 ) 



(App)r,km y 


x (z-1) :: Int h^^y (<5^ - 

<5.- 


<5l A <5a; 
> 67, <5g) 


— > 62 /\ Sy 


— > 5s A ^4 — > 5g A 


(If) 


r,if ... :: Int h^^y (<5y - 


<5i A 5a; 


— > S2 ^ S-j 


— > 5s A ^4 — >■ 5g A 




<5.- 


* S7 A Sz - 


—> 6q A Sx 


6y ^ Ss A 




<5e — 


* Ss ^ Sr 


A 6q ^ 6g 


Sr, Sr) 


(3Intro) 


r, if...:: Int ^i^fiSy- 


Si ASx 


^ S2 a 5 z 


S3 A 




Sa- 


^ Sz ^ Sr 


Pi 

'O 

T 

< 


T 

T 


(Abs)x3 


r,km :: t {Sy- 


-^Si ASx- 


-^62 AS z — 


*63 A SA^Sz^Sr A 




Sx- 


^ Sy ^ Sz 


— ^ (5t-, 6x ' — ^ ^ ^ ^r) 


(3Intro) 


r,km::t hj^y (4 - 


-^Sy^S, 


z ^ t 


1 -^ Sy 1 -^ Sz 1 -^ Sr) 


In the last step 


we obtain the first approximation of km in 


the fixpoint iteration. 



We can evaluate the next iteration by adding the constraint <5i — > <52 — > <5$ — > <54 
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to the constraints from the (Var) step onward. This time in the last step the 
elimination of variables {^i, <52, (Ja, i54} results in the same description. Hence the 
fixpoint is reached and we have 

km . ^ <5y ^ <5_2 ^ — k Sx < ^ <5y < ^ S jz ^ ^ <5?' 

That is km is strict in all arguments. 

We can prove soundness and completeness results for well- typed programs. 
Soundness states that every deduction derived by the inference system can also 
be derived in the logical system: 

Theorem 3 (Soundness of Inference). Let T, e :: t (C', t). ThenC,r h 
e : T. 

Completeness states that every deduction derivable in the logical system is 
subsumed by a deduction in the inference system: 

Theorem 4 (Completeness of Inference). Let C,F h e : V^i.Ci ti and 
(e :: t) be the type- annotated version of expression e such that ti is of shape t. 
Then T,e :: t (C2,T2) for some C2,T2,^2 such that 62 = fv{C2,T2)\fv{r) 

and C A Cl h 3 ^ 2 . (C2 A |(t2 <s n)]). 

3.1 Constraint-Based Fixpoints 

If we are willing to give up some accuracy in the presence of polymorphic recur- 
sion in strictness types then we can replace the (Fix) rule by a simpler rule which 
just ensures that the constraints generated will be above the least fixpoint. This 
is similar to the (FIX) rule of Kuo and Mishra 0 (they only gave a constraint 
based fixpoint rule, no iterative fixpoint rule). 

The following rule forces all recursive invocations to have the same strictness 
type which must be a super-type of the strictness type of the recursively defined 
expression. 



t h t' Tx-X : r',e :: t (C'i,r) t h r" 

(FixC) C = Cl A |(t <s r')] A |(t r")| 

Tx, (fix a: :: t in e) :: t h^^y (C, t") 

The “shortcut” of stipulating (r <g t') may have the unfortunate side-effect of 
introducing constraints amongst the arguments to a function. An example of 
this phenomenon appears in Example El below. A simple way of eliminating such 
constraints is to couch the result in terms of a fresh binding-time type t" , with 
the constraint (r <g t"). 

This approach may lose accuracy, since it forces the strictness of the ar- 
guments in the recursive call to be the same as the strictness for the overall 
function. It finds a correct but imprecise strictness type for the function. 

Example 2 . We illustrate the constraint-based fixpoint rule on km starting from 
the second last inference step of Example E Let r' = <5i 1 -^ <^2 > <5s <— *■ <^4 and 
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t" = 6” I— > Sy I— > S” S”. The (FixC) rule proceeds by adding the constraints 
I(<5x Sy Sz >—>■ Sr <S Si 1-^ S2 S3 1-^ 54 )] and |((5a; i5y ,5^ <s 

S” Sy (5" 5")] to those inferred in the second last inference step of 

Example n as follows: 



F, km : 


■ ^ ^inf 


-4i5i A 


S^ASz^ 


S 3 A S 4 ^^Sz^Sr A 




Sx- 


^Sy^Sz- 




(FixC) /femjfix km in ... : 


■ ^ ^inf ~ 


^Si A Sx^ 


S 2 AS z^ 


S 3 ASu^Sz^Sr A 




Sx- 


^Sy^Sz- 


^ Sr AS\ 


^ Sx A S 2 ^ Sy A 




S 3 - 


> (5z A i54 — > 


Sr A (5" - 


Sx A S') ^ Sy A 






^ Sz A Sr —■ 




1 

)5: M 
1 


(3Int) km in ... : 


■ ^ '^inf 


S" S" 


^5'!. S') 


^ S') ^ S') ^ S'O 



Note the spurious consequence Sx Sy that arises on the variables of r in 
addition to the correct answer Sx Sy Sz Sr- This is eliminated when we 
couch the results in terms of fresh type r". 

For this example the constraint-based fixpoint leads to no loss of accuracy 
since the strictness of the two arguments x and y which were forced to be equal 
were already equal. Notice that this is an example where the approach of Kuo 
and Mishra [3| loses accuracy. 

The rule is not complete wrt. the logic in Section|2 The result using (FixC) 
for the function h x y b = if b then x else h x (h y x b) b is 

\\\\/xyb.b^x^r/\b^y^r^x^y^bi-^r 

rather than h'.yxyh.b^x^r^x^y^bi-^r. However, the (FixC) rule 
is sound. 

Theorem 5 (Soundness of FixC). Let F,e :: t (C, t) using (FixC). 

Then C,F h e : r. 



4 Polymorphism and Algebraic Data Types 

Any realistic program analysis needs to be able to handle additional language 
features such as polymorphism and algebraic data types. We first extend our 
strictness analysis to ML-style polymorphism. Thereafter, we describe our han- 
dling of algebraic data types. 



4.1 Polymorphism 

In previous work PJi have already described how to extend Dussart et al’s 
binding-time analysis 0 to programs which are polymorphic in their underly- 
ing types. Fortunately, most of the methods and concepts introduced for the 
binding-time analysis of polymorphic programs carry over to the strictness anal- 
ysis described in this paper. 
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We assume that the underlying type language is now polymorphic, as follows: 

Types t ::= Of I Int | Bool \ 1 t 

Type Schemes cr ::= t \ Va.t 

Example 3. Consider the following polymorphic program: 

sel :: Vd.Bool (oi 02) (cci 02) ^ a\^ a2 
sel b f g X = if b then f x else g x 

which either applies function f or function g to value x, depending on the boolean 
expression b. 

The strictness description of sel is as follows: 

We use the convention that variables (3 refer to strictness variables which 
correspond to a polymorphic variable in the underlying system. For example, 
strictness variables /3n, (3i2 and P13 correspond to variable ai. Note that al- 
though we still use HORN constraints, constraints of the form /Sis ^ Pii now 
express structural relationships between instances of P13 and /3n. In a system 
without Boolean constraints, we would write {P13 <s fSn) instead. The advan- 
tage of using Booleans is that we do not need to introduce any new primitive 
constraints. For example, the constraint <5 ^ P21 P23 states that if (5 = T then 

(/?2i /?23) must hold, otherwise the relationship between instances of (S21 and 

/?23 is unaffected. 

Assume that we apply function sel in the following context: 
sel’ = sel U (Int 1— > Int) (Int Int) 



sel’ b f g e 


: Int 1-^ 


Int 






(C,r) 








b 


: Bool 








{Cb,6b) 








f 


: (Int 1-^ Int) 


1-^ (Int 


Int) 


(CfASfi 


^ <5/2) 


^ (<5/3 


^ <5/4)) 


g 


: (Int 1-^ Int) 


1-^ (Int 


1-^ Int) 


(Cg,{Sgl 


5g2) 


(<5g3 


1-^- (5g4)) 


e 


: Int 


Int 






{Ce, 5el <5e2) 







where j) denotes polymorphic type application, instantiating polymorphic vari- 
ables ai and 02 with Int 1-^ Int. Note that we have type-annotated all expres- 
sions and written all strictness descriptions on the right-hand side. 

We obtain the strictness description (C, t) of expression sel’ b f g e as follows. 
We perform the following substitutions on the type component of sel 

/dll = <5/1 <5/2 /?21 = <5/3 <5/4 

/dl2 = <5gl <5g2 IS 22 = 5gS 1-^- 5gi 

(Sis = <5el l5e2 /?23 = <5i 1-^- (52 

which gives r = (S2S = (5i i-^- (52. On the constraint side, it must hold that all 

relations which hold for polymorphic strictness variables carry over to their in- 
stances. For example, the constraint (Sis Pii generates the constraint |(i5ei 
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<5e2 <s <5/1 ^ <5/2)1 <5 ^ (321 /323 generates the constraint 5 |(<5/3 

<5/4 <s <5i <52)1- We find 



<5/1 ^ <5el A <5e2 ^ <5/2 A <5gi ^ 5el A (5e2 ~ <5g2 A 
„ _ 5 — > — > <5/3 A 5 — > 5/4 — > ^2 A 

5 ^ 5i — > 5g3 A 5 ^ 5g4 ^ ^2 A 
C{, A Cf A Cg A Sb — > S 

In general, the method for building an instance of a strictness type scheme 
is as follows. We first relate polymorphic strictness variables to corresponding 
polymorphic variables in the underlying system. To this purpose, we introduce 
judgments Z\ h r : t where Z\ is a shape environment associating strictness 
variables (3 to type variables a. Judgments Z\ h r : t are derived as follows: 

(Booli) AAS: Bool (Booly) Z\ h T : Bool (Boolj_) Z\ h _L : Bool 



(/?) 



(/3 : a) G Z\ 

AA p:a 



(Arrow) 



A A Ti :ti Z\ h T2 : t2 
Z\ h Ti T2 : ti ^2 



For brevity, we omit rules (Int^), (Inty) and (Intj_). 

We note that shape inference is decidable. Given a strictness type r with 
underlying type t, we denote by r, i h A the algorithm to infer the most general 
shape environment A. Given a shape environment A and underlying type t, we 
denote hy A, t h r the algorithm to infer the most general strictness type t. We 
refer to 0 for more details. 

Example J. Goming back to the example above, relating the strictness descrip- 
tion of sel with its underlying type, we find that A = {/3n : oli,(3\2 '■ cri, 

/5 i 3 : 011,(321 '■ 0:2, /?22 : 0:2, /?23 : 0^2}- 

Gonsider a polymorphic expression with strictness type 'i/3,S.C t and 
underlying type Vd.t. Apply this expression to a sequence of (underlying) in- 
stantiation types t. First, compute the shape environment A from r,t h A. 
Then, for each pair {(3ij,ai) where A h (3ij : ai we generate a fresh such 

that A,ti h Tij. This gives us a sequence of (strictness) instantiation types f. 
Finally, we must build the instantiated constraint [f/P]C. 

For each strictness constraint involving polymorphic variables, we need to 
generate the appropriate instantiated constraints. We observe that there are two 
possible kinds of polymorphic constraints: 

{l)Si^...^6n^(3^/3' 

{2)Si^ (3 

where n possibly equals zero. We have already seen constraints of the first 
kind. Gonstraints of the second kind state that if we have no information about 
Si, ... ,Sn (that is Si, .. .Sn = T), then we do not have any information about 
P either. Such constraints prove to be useful if our analysis is not able to state 
any results. 
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Example 5. Consider an imported function im :: Va.a a a with no analysis 
information available. The best we can do is to assign im the following strictness 
type 

im : V/3 i/32/?3- Pa ^ Pi ^ P2 ^ Pa 

which states that we do not have any information about im’s strictness behaviour. 



We proceed by querying which relations hold and add in the appropriate 
constraints. The structure of the constraints allows us to avoid testing for cer- 
tain combinations. For example, given a constraint S P P' we know that 
variables S and P must appear in a negative position and variable P' in a positive 
position of the type component. 

We define polarities recursively: P appears in positive position in r (written 
r[/ 3 +]) iS T = P, or else t = ti 1-^ T2 and either ti[P~] or T 2 [P'^\- Similarly, P 
appears in negative position in t (written t[P~]) iff t = ti T 2 and t\[P'^] or 

T2[P~\- 

We define the query relation for the first kind of constraints as follows 



Tai{A,T,Ti) 



5i 



^ ^ —s '^ik) 



A Pij : ai,T[P~X 
A'^ Pik\ ai,T[pfP\, 

],..., r[(5-] 

C 1= (5l ^ Pij^ Pik , 



In a similar style, we handle constraints of the second kind. We define 



\P\=P l<5|=(5 \t^t'\ = \t'\ 



Then, the query relation for the second kind of constraints is as follows 





I 


■ ■ ^ Sn ^ \Tij 1 


A V P,j : ai,T[P±], ] 

r[,5f ],..., t[,5-] ) 


1 


1 




C A ^ ^ Sn ^ Pij ) 



Note that it is sufficient to just constrain the final result of a function. 

For example, assume that we have 5 ^ P and P is instantiated by (i5i 
1 ^ 2 ) 1 -^ (i^a < 54 ). Then, it is sufficient to add in (5 ^ i54 only. 

Both query relations can be implemented efficiently which ensures the effec- 
tiveness of analysing polymorphically typed programs. 



r(Z\,r,r) = ^T'^.{A,T,Ti)) 

The resultant inference rule for polymorphic application is as follows 



X : V/3, 5.C ^ T G r t' = [t/a]t r,t \~ A 

for each A h Pij : ai generateZ\, U h ry 
(Var-Inst) ^ ^ ^ 

r, ((x :: Vd.t) H t) :: t' (C',r') 

where (x :: Vd.t) (I t denotes instantiation of polymorphic variables d by types t. 
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We have omitted the details of building strictness types which are polymor- 
phic in annotation and underlying types. It is straightforward to adjust the 
corresponding rules (Let), (VI) and ( 31 ). We refer to previous work | 3 ] for a 
detailed development. 

4.2 Algebraic Data Types 

We first show how to handle non-recursive data types 0 
(iota Maybe a = Justo; | Error Int 
which translates to the following strictness data type 
Maybe '^^ /3 (52 = Just /3 | Error ^2 

The top-most annotation ( 5 i tells us whether the data type is totally undefined 
(when (5i = -L), or (when (52 = T) the constructor may be known, but some of the 
arguments of the constructor are possibly undefined. The strictness variable (3 
represents the strictness behaviour of a and 82 represents the strictness behaviour 
of Error’s integer argument. 

Value constructors are translated as follows: 

Just : V/ 3 , (5i,(52. 81 ^ ( 3 ^ Maybe^^ (3 82 
Error : ^^,81,82- 81 ^ 82 Maybe"^^ (3 82 

Note that for Just we do not constrain 82 and for Error we do not constrain / 3 . 
In general, a non-recursive data type 



dataT a = kit\ | . . . | kntn 



translates to 

f3 8 = kifk^ I ... I knfk^ 

where each ai is represented by j 3 i and fk^ results from A,ii h fk^ where A = 
{j 3 i : tti} and 8 are the freshly introduced annotation variables in ffej , . . . , Tfc„ . 
Each value constructor ki : Vd.tii 1— > . . . 1— > tim <—^Ta translates to 

V/3, 5. (5 ^ Tji 1 -^ . . . 1 -^ nm P 8 

We find the following inference rule for case expressions: 

r,e::TF (Co,T^ f' 8') 
t',F h A A,F h f" A,t h T 8” fresh 
(Case) T.ij : [f'7/3, t (Cj,r"') for / = l..n 

g A Ar^i g.A |(f' <, f'OI A [(7 <, 571 A 8^{AtiKr'" <s r)I) 
r, (case e :: T t' of fciXi ei | . . . | e„) :: t (C, r) 

^ We note that non-recursive data types could be encoded as functions, this would 
give the same strictness information. 
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For recursive algebraic data types, we only consider the strictness properties 
of the top-most constructor. In effect, all information we have about recursive 
components is discarded. For example, the list description [/3]^ is interpreted as: 

<5 describes the strictness behaviour of the topmost list constructor (so if it is 
_L then the whole list is _L) while (3 describes the strictness of the element in 
the first cons cell. So, if a list-consuming function is strict in 6 then it is safe to 
evaluate the list as far as its first constructor (i.e. to weak head normal form) 
before the function is applied. If the function is also strict in /3 then the element 
in the head of the list can also be evaluated before the function is applied. 

There have been many proposals for finding more accurate information for re- 
cursive data types, usually in the context of lists. For example, head-strictness H31, 
spine-strictness (the shape of the expression is required) and full-strictness (all 
components of the expression are required) inn. We are currently investigating 
extensions to our analysis method to support these properties. 

5 Implementation and Empirical Results 

We have built a prototype implementation of the strictness analysis presented 
here for the Haskell compiler, GHC. The analysis engine is written in Haskell, 
while the Boolean constraints are handled by a specialized HORN constraint 
solver written in C, which represents Horn clauses as graphs with directed hyper- 
edges. It supports the operations of: conjunction, existential quantification, copy- 
ing and renaming. For polymorphic application we provide specialized support 
for efficiently determining (a minimal set of) conjunctions (5i A • • • A (5„) such 
that ((5i A • • • A <5„) ^ (/3i ^ /? 2 ) based on finding paths from f3i to /? 2 - 

GHC’s current strictness analysis m is based on the abstract interpretation 
approach of Burn et al. [I] , made practical by a widening operation which elim- 
inates all but weak-head normal form information about higher-order types and 
algebraic data types with more than a single constructor. The strictness analysis 
produces a strictness description for functional variables which is used by a later 
pass to optimise these functions safely. The description assigns each argument 
one of S (for strict in weak head normal form, L (for lazy) and U{di , . . . , d„) for 
a strict argument of a product type (one with only a single constructor), where di 
is the strictness description for argument i of the constructor. For example, the 
strictness descriptions for -|- is SS (for unboxed integers), snd is U(LS) (strict 
in the pair and its second part) and f (from the introduction) is LLLS (strict 
only in h) . These strictness descriptions are written to an interface file for use in 
analysing modules which import analysed functions. 

This approach appears very weak, it loses all information about application 
of functional arguments. This would seem very unfortunate because even simple 
Haskell programs often have implicit higher-order arguments because of type 
classes and their translation to dictionary passing. The simple function double x 
= X -I- X with type double ::Numa:^ai-^a becomes, after processing, something 
like (for simplicity the Num dictionary is drastically simplified) 

doubleCore ::(ai-^ai-^a, ai-^ai-^a)i-^ai-^a 

doubleCore (plus, minus) x = plus x x 
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and the GHC strictness description discovered is U{SL)L (strict in the plus part 
of the dictionary, but not in x). This apparent weakness is often avoided by 
aggressive inlining. Hence although we might expect doubleint = (Int) double to 
be given an L strictness description (no information), this only happens if inlining 
of double is switched off. Inlining of double plus dictionary specialization results 
in GHC’s current analyser finding the desired S strictness description. 

Gurrently, we do not write or read our strictness descriptions to/from the 
interface file, instead we use GHG’s strictness info to build a (weak) Boolean 
strictness description for imported functions. 

We can show that we are at least as accurate as GHG’s current strictness 
analyser by mapping our Boolean strictness types to the simple sequences above 
and comparing the number of S and U annotations that occur. Note that this 
mapping throws away a considerable amount of the strictness information we 
have in our descriptions. We find all the strictness results that GHG finds. In 
addition, we often improve GHG’s information. 



Table 1. Gomparison with GHG Built-In Strictness Analysis. 









GHC 


Fix 




FixC 




Program 


Functions 


Lines 


Comp (secs) 


%-|-Time Strict 


%-l-Time 


Strict 


bspt 


271 


2141 


29.21 


3473% 


71 


1193% 


4-47 


cacheprof 


331 


2151 


37.64 


662% 


4 


111% 


4 


compress2 


19 


198 


4.49 


88% 


— 


94% 


0-4 


ebnf2ps 


485 


2927 


34.78 


1394% 


42 


77% 


41-34 


fern 


103 


1286 


27.19 


38% 


2 


37% 


6-7 


fluid 


278 


2401 


34.74 


124% 


22 


52% 


18-50 


gamteb 


48 


701 


15.83 


30% 


4 


24% 


4-8 


gg 


292 


812 


19.81 


89% 


9 


40% 


12-20 


grep 


148 


356 


5.11 


150% 


2 


83% 


8-10 


hidden 


233 


521 


17.44 


49% 


63 


36% 


64-15 


hpg 


319 


2067 


15.32 


2419% 


133 


416% 


14-96 


infer 


398 


584 


13.33 


237% 


12 


84% 


14-80 


lift 


1943 


2043 


7.75 


5% 


1 


0% 


3-35 


maillist 


10 


175 


1.59 


51% 


4 


32% 


9-9 


mkhprog 


75 


803 


4.56 


76% 


15 


44% 


0-19 


pic 


96 


527 


13.26 


30% 


25 


23% 


5-11 


Prolog 


164 


539 


7.01 


227% 


22 


60% 


14-31 


reptile 


205 


1522 


20.35 


55% 


14 


37% 


4-14 


rsa 


14 


74 


2.03 


11% 


1 


11% 


1 


symalg 


38 


1146 


16.79 


212% 


2 


8% 


5-8 


imaginary 


97 


318 


11.03 


163% 


15 


47% 


17-20 


spectral 


3385 21709 


240.66 


703% 


384 


44% 145-397 



Our experimental results are shown in Tabled giving benchmark name (from 
the nofib suite) size (in no. of functions and lines), and original GHG compi- 
lation time. The data for each fixpoint rule is: the increase in compilation time 
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when our strictness analysis phase is added, and the increase in number of U and 

5 annotations found. Using (Fix) we are uniformly more accurate than GHC, 
but this is not the case for (FixC). An entry x-y indicates x U+S annotations 
found by (FixC) where GHC found L, and y U+S annotations found by GHC 
where (FixC) found L. Two smaller nofib benchmark suites imaginary and 
spectral are shown in one summary line for space reasons. 

The purpose of the comparison is mainly to show that the analysis is feasible, 
on a large and varied suite of programs. Our approach would already improve 
GHC’s current approach. We emphasize that the descriptions we are calculating 
carry much more strictness information. 

Overall, we were surprised by the accuracy of the current analysis for GHC 
given its simplicity. Clearly, the GHC implementors have ensured that the strict- 
ness analysis phase is well-integrated into the compiler as a whole and pro- 
vides exactly the information they require. Of course, in the comparison we are 
favouring GHC by only comparing on the information it collects. By writing and 
reading our strictness information to/from the interface file we should be able 
to improve our accuracy significantly. Better handling of recursive data types 
should also lead to much more accuracy. The (FixC) approach currently loses 
too much strictness information. Investigations so far suggest that (FixC) loses 
considerable information when the recursive call has an always constructed al- 
gebraic data type. Unfortunately, since basic types, such as Ints, are boxed this 
is very common. We are investigating how to avoid this information loss. 

The analysis times are large, but not infeasible. At present, the large anal- 
ysis times seem to be caused by very large algebraic data types in combination 
with deeply nested bindings caused by GHC’s aggressive inlining. Our current 
implementation is only a prototype and there are a number of obvious places 
for improvement: early elimination of variables during analysis; building a more 
efficient constraint solver; and, gracefully losing information when the analysis 
begins to take too much time. Another possibility worth investigating is a hy- 
brid (Fix)/(FixC) approach which uses FixC after a certain number of fixpoint 
iterations. Arguably, the (Fix) implementation could be used now in a highly 
optimizing compilation mode where we are willing to pay (considerable) time 
for the best possible performance. 

6 Conclusion and Related Work 

There is a huge amount of theoretical work of strictness analysis, but very little 
of it has translated into practice. Our aim with this work was to see how far 
we could go in producing an accurate strictness analyser for a real programming 
language while remaining practical. 

The starting point for the work presented here is our binding-time analysis 
using Boolean constraints |3|. The constraints that arise in strictness analysis 
are more complex (Horn formulae rather than simple implications) but the ap- 
proaches are quite similar. Indeed, the two implementations share a considerable 
amount of code. We claim the analysis we have built is (almost) practical, and 
accurate compared to other implemented analysers. 
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Other implemented strictness analyses include Jensen et al. ^ which im- 
plements Burn et al.’s abstract interpretation approach using chaotic fixpoint 
iterations (which give a demand-driven evaluation of function descriptions that 
makes the higher-order function descriptions far more tractable). Unfortunately, 
the chaotic fixpoint approach does not handle separate modular compilation, and 
the only experimental data they give is for very small programs. Seward jl I ) de- 
veloped an abstract interpretation based strictness analysis in GHC which was 
capable of analysing large programs. The approach is essentially first-order, and 
loses precision for polymorphic functions. Results are given for small programs, 
and it is not clear how effective it is in practice. 

The Clean functional language has a well regarded strictness analyser which 
finds strictness information by a process of abstract reduction jO] . It is not easy 
to compare our results with this system. The system is more complex than the 
system presented here and consequently difficult to show correct, and implement. 

The theoretical approaches that are most similar to ours are that of Kuo and 
Mishra |7|, who gave the first constraint-based strictness analysis, and first used 
constraint-based fixpoints. They also gave an informal discussion of handling 
let-style polymorphism. Their approach is weak because it does not express 
conjunctive strictness information directly, but needs to do case by case analysis. 

Wright HH employs an annotated type-system to express strictness proper- 
ties. In contrast to our system, he annotates function instead of base-types and 
he requires the full Boolean domain. 

The closest work to ours is probably Jensen’s framework for strictness jS|. 
He uses an inference system similar to ours to determine strictness types. Condi- 
tional properties are expressed using a ? operator, where (pltp is interpreted as T 
a ijj = _L and (j) otherwise. For example, the strictness of f from the introduction 
is (in a format as close to ours as possible) 

f : VJi( 52J3J4- (Jl I— > J 2 ^ J 3 J 4 (J2AJ3)?<54) A(Ji J 2 J 3 J 4 Jl?J4) 

Clearly, the conditional expressions (ji!5 play the same role as <5 in our (If) rule 
^ [(h — s A (t 2 <s t) 1- However, the constraints in Jensen’s approach 

are complicated and he needs to define meaning of new operators and rules for 
constraint simplification. For example, he reduces the constraint Ji?J 2 < J 3 A C 
to the (disjunctive) set of constraints {Ji < J 3 A (7, J 2 = T A (7} where each new 
constraint represents one possible way of proving the original constraint. This 
can be avoided by simply using well-understood Boolean constraints. 

Jensen’s work is based on ranked intersection types, which may appear to 
be much more expressive than the strictness types we consider. However, we 
conjecture that his approach captures no more useful strictness information. 
At least for the examples in Jensen’s paper the two approaches give equivalent 
strictness information. There is no implementation of Jensen’s approach that 
we are aware of, and the approach does not tackle let-style polymorphism and 
algebraic data structures. 

We have presented an accurate and feasible strictness analysis based on 
HORN constraints, which we believe will eventually form the basis of a robust 
and practical strictness analysis in everyday use in a real compiler. In future 
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work, we will investigate the exact connection of our work with that of Jensen, 
Wright and others. 
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Abstract. We present the implementation of cTI, a system for universal 
left-termination inference of logic programs. 

Termination inference generalizes termination analysis/checking. Tradi- 
tionally, a termination analyzer tries to prove that a given class of queries 
terminates. This class must be provided to the system, requiring user 
annotations. With termination inference such annotations are no longer 
necessary. Instead, all provably terminating classes to all related predi- 
cates are inferred at once. 

The architecture of cTI is describecfl and some optimizations are dis- 
cussed. Running times for classical examples from the termination liter- 
ature in LP and for some middle-sized logic programs are given. 



1 Introduction 

Termination is a crucial aspect of program verification. It is of particular impor- 
tance for logic programs mEi, since there are a priori no syntactic restrictions 
to queries. In fact, most predicates do not terminate for the most general queries. 
Termination has been the subject of many works in the last fifteen years in the 
logic programming community Contrary to other languages there are 

two notions of termination for logic programs ESI: existential and universal ter- 
mination. To illustrate them assume we use a standard Prolog engine. Existential 
termination means that either the computation finitely fails or produces one so- 
lution in finite time. When asked for further solutions, it may loop. On the other 
hand, universal termination means that the computation yields all solutions and 
fails in finite time (if we repeatedly ask for further solutions). Although existen- 
tial termination plays an important role for normal logic programs, it has severe 
drawbacks: it is not instantiation-closed {i.e., a goal may existentially terminate, 
but some of its instances may not terminate), hence it is not and-compositional 
{i.e., two goals may existentially terminate, but not their conjunction), finally it 

^ A preliminary version of this paper was presented at the Workshop on Parallelism 
and Implementation Technology for Constraint Logic Programming Languages (ed. 
Ines de Castro Dutra), CL’2000, London. 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 93-|nSl 2001. 
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depends on the textual order of clauses. Universal termination has none of these 
brittle properties. 

Existential termination has been the subject of only a few efforts [4f)l2iSI27l,S2) 
whereas most research focused on universal termination. There are two main 
directions (see |E2] for a survey): characterizing termination j4ll and find- 
ing weaker but decidable sufficient conditions that lead to actual algorithms, 
e.g. |44i:i/i4b| and m using complexity upper bounds. While our research be- 
longs to both streams, we focus in this paper on the implementation of our 
approach. A companion paper positions our approach in the theoretical setting 
of acceptability for constraint logic programming 

Our main contribution compared to automated termination analysis IZOUSI, 
is that we infer sufficient universal termination conditions from the text 
of any Prolog program. Inference implies that we adopt a bottom-up approach 
to termination. There is no need to define a class of queries of interest. We point 
out that giving a class of queries is imposed by all other works we are aware of. 
If required, these classes can be easily simulated within our framework. 

Our system, cTI (constraint-based Termination Inference), is available at 
URL http:www.complang.tuwien.ac.at/cti and has been realized in SICStus 
Prolog. The only correctness requirement we currently impose on ISO-Prolog m 
programs is that they must not create infinite rational terms. Hence we consider 
execution with occur check or NTSO (not subject to occur check) programs [2 1) 
that are safely executed with any standard complying system. cTI is also used 
within the LP environment GUPU m- 

In Section El we present cTI informally with an example analysis. The central 
fixpoint algorithms for computing models are covered in Section El Our scheme 
to determine level-mappings is given in Section El Finally, Section 0 presents an 
empirical evaluation. 



2 An Overview of cTI 

Our aim is to compute classes of queries for which universal left termination is 
guaranteed. 

Definition 1. Let P be a Prolog program and q a predicate symbol of P. A 
termination condition for q is a set TCq of goals of the form <— q(t) such that, 
for any goal G G TCq, each derivation of P and G using the left-to-right selection 
rule is finite. 

Our analyzer uses three main constraint structures (see |25|2tij for a presen- 
tation of the CLP paradigm): Herbrand terms (CLP(7f)) for the initial program 
P, non-negative integers (CLP(IN)) and booleans (CLP(S)) for approximating 
P. The correspondence between these structures relies on approximations 1231, 
which are a simple form of abstract interpretation also called abstract 

compilation. We illustrate our method to infer termination conditions by using 
the predicates app/3, app3/4, and nrev/2. 
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app([], X, X). 
app([E|X], Y, [E|Z]) 
app(X, Y, Z). 



nrev(0, [])• 
nrev([E|X], Y) ^ 
nrev(X, Z), 
app(Z, [E], Y). 



app3(X, Y, Z, U) 
app(X, Y, V), 
app(V, Z, U). 



1. The initial Prolog program P is mapped to a program in CLP (IN) 
using an approximation based on a symbolic norm. In our example, we use 
the term-size norirU 



m\t 



erm-size 



n 

1 -l- 1 1 1 1 term-size if ^ ~ /(^ii ■ • ■ i tn), n > 0 

i—1 

0 if t is a constant 

t if t is a variable 



E. g. 11/(0, = 1- A-11 non-monotonic elements of the program are 
approximated by monotone constructs. E.g., Prolog’s unsound negation \-fG 
is approximated by ((G, false) ;true). The approximation maintains that 
if a goal in is terminating, then also the corresponding original goals in 
P terminate. 



appiN(0, X, X). 

•ippiN ( 1TE-|-X,Y,1-|-E-|-Z) 
appiN(X, Y, Z). 



nrev]N(0, 0). 
nrev]N(l-l-E-|-X,Y) ^ 
nrev]N(X, Z), 
appiN(Z, 1-hE, Y). 



app3]N/4 
same as 
app3/4 



2. In IN we compute a model of all predicates. The model describes with a fi- 
nite conjunction of linear equalities and inequalities the inter-argument rela- 
tions post that hold for every solution. The actual computation is performed 
with CLP(([)), using a generic fixpoint calculator with standard widening 
detailed in Section El In our example the least model is found. In general, 
however, only a less precise model is determined. With the help of such a 
numeric model, for each recursive predicate p (the only source of potential 
non-termination), we compute a valid level mapping called Pp, see Section El 
For instance, the intuitive meaning of is: for any ground recursive clause 
defining appTM, the first and the third argument decrease. Section 0 explains 
more precisely how the model in IN is taken into account (roughly speaking, 
it helps for dealing with variables that are local to a clause and that reappear 
in recursive calls). As app3iN is not recursive, we set its level mapping to 0. 
(least) models level mappings 

post^pp{x,y,z) =x + y = z pfpp{x,y,z) =min{x,z) 

Post^,e^{x,y) =x = y =x 

Postfppsix, y,z,u)=x + y + z = u pfppsix, y, z,u) = 0 



^ cTI applies the term-size norm by default. All ISO-predefined predicates are pre- 
analyzed for this norm. But a similar analysis of any pure Prolog programs can be 
done using any linear norm, and remains correct. Note that the resulting boolean 
termination conditions should then be lifted to termination conditions with respect 
to the chosen linear norm. 
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3. 



is mapped to P®, a program in CLP(P). Here 1 means that an argument 
is bounded wrt the considered norm, while 0 means that this information is 
unknown. Note that the obtained program no longer maintains the same ter- 
mination property. Its sole purpose is to determine the actual dependencies 
of boundedness within the program. The simplified structure allows us to al- 
ways compute the least model. For each predicate, its previously computed 
linear level mapping is represented by a single boolean term. 



appe(l, X, X). 
appe(lAEAX,Y,lAEAZ) 
appe(X, Y, Z). 



nreve(l, 1). 
nreve(lAEAX,Y) <— 
nrevg(X, Z), 
appe(Z, lAE, Y). 



app3e(X,Y,Z,U) < 
appe(X,Y,V), 
apPB(V,Z,U). 



least models 

Post^ppix, y,z) ={xAy)^z 
Post^,ev(x,y) =x^y 
Postfpp 3 (x,y,z,u) = (xAyAz) ^ u 



level mappings 
P^ppix, y,z) =xVz 

Pnrev{x,y) =X 

plpp3i.x,y,z,u) = 1 



4. Using the informations obtained in P®, boolean termination conditions are 
determined with the following boolean /r-calculus formulae taking level map- 
pings and boundedness propagation (at each program point) into account. 
The greatest fixpoint vT is determined with a /i-solver 



P^’Capp = uT.X{x,y,z). 

{ Plpp{x,y,z) 

\ f\ Ve, a;', (lAeAa;')) A (1 AeAz'))] ^ T{x', y, z') 

prCnrev =vT.X{x,y). 

f Pnrev{x,y) 

< /\ Ve, x', z.[{x^ {lAeAx'))] T{x', z) 

[ A Ve, x', z.\{x^ llAeAx')) A post^^^^{x' , z)] pre^pp(z, 1 Ae, y) 
P^apps = vT.X{x, y, z, u). 

( Plppz{x,y,z,u) 

<1 A Vu.l ^ pre^pp{x,y,v) 

[ A Vu.postfpp(a;, y, v) pre^pp{v, z, u) 

Let us try to intuitively explain such a boolean y-calculus formula. First, we 
want at least one level mapping to be bounded. Second, if there is a call to 
a clause and if its first (from left to right) i body atoms succeed, then, after 
including their success, the call to the i + 1-th body atom must terminate. 
Third, we are interested in the largest solution. Solving the equations of our 
example gives: prCapp(a;, y, z) = a; V z 

pre„rev(a:,y) =x 
pres,pp 3 ,{x, y, z, u) = (a; A y) V (a; A u) 

5. These boolean termination conditions lift to termination conditions (see Def- 
inition QJ with the following interpretation, where the c’s are CLP(7f) con- 
straints: 
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~ any goal <— c, app(X,Y,Z) left-terminates if X or Z are ground in c. 

“ any goal <— c, nrev(X,Y) left-terminates if X is ground in c. 

— any goal ^ c, app3(X,Y,Z,U) left-terminates if either X and Y are 
ground in c or X and U are ground in c. 

The correctness of the analysis, first described in is based on the follow- 
ing result (see |2S| for details and proofs). Let p denote the equivalence class of 
p with respect to mutual recursion seen as an equality relation, x denote a tuple 
of distinct variables, and VF denote the universal closure of the formula F. 

Theorem 1. (Correctness) 

— Let P be a logic program and Q a query. Let and be their images by 

the term-size norm. Lf any left derivation of P^ and is finite, then any 

left derivation of P and Q is finite. 

— We consider now left termination of P^ in CLP(N). Let LLp be the set of 
predicate symbols defined in P^ . Assume that: 

• Vp S Lip, postp has been computed (see point 3 above and Section\^; 

• Vp G Lip, Up is a boolean representation of some valid level mappings 
for p (see point 3 above and Section 0; 

• p is defined by rrip rules (1 < k < rup): 

• for each q ^ p that appears in the rules defining p, a termination condi- 
tion prcq has been computed. 

If the set of boolean terms {prep}p^p verifies: 

(W[prep{x) /i®(i)] 

Vp G p < Vfc, j, 1 < /c < TOp, l< j <Uk : 

[ V{[prep(i) A ^ A ACi (post^k.i A cf J] ^ prcp,,^. (ifcj)} 

then {prcplpgp is a boolean termination condition for p. 

We note that the boolean system defined in Theorem [I] can be used not only to 
check if a set of boolean terms is a set of boolean termination conditions but also 
to infer such set, with the help of a /i-calculus solver (see point 4 above). Finally, 
it remains to lift the computed boolean termination conditions to termination 
conditions (see point 5 above). 



3 Fixpoint Computations 



As motivated in Section El we have to compute some models of two versions of 
the initial program: P^ , the CLP(IN) version, and P®, the CLP(P) version. To 
this aim, we have developed an abstract immediate consequence operator Up 
being similar to the well-known Tp. (An similar approach tailored to CLP(IR) 
has been described in m-) In this section we rely on numerous results in abstract 
interpretation 



IKillSllEi IH 
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3.1 The Algorithm 

The key of our abstract computation is the notion of rational interpretation for 
a predicate symbol p: 

Definition 2. Let P be a program with predicate symbol p. We call a rational 
interpretation of p an equivalence of the form: p{x) c{x) where c{x) is a (fi- 
nite) constraint such that vars{c{x)) C x. We extend this notion to P: a rational 
interpretation of P is a set I containing exactly one rational interpretation for 
each predicate symbol p of P. 

We denote by T the set of all rational interpretations. To compute a rational 
interpretation being a model of P we define below an operator Up. We assume, 
without loss of generality for those computations, that the rules are of the form 
p{x) <— c,pi(afi), . . . ,Pn(xn) where c is the only constraint of the rule. Let V 
be an associative-commutative operator which generalizes disjunction in the fol- 
lowing sense: for any constraints ci(i) and C 2 (i), V(cijC 2 ) is a constraint over 
X such that V[ci V C 2 ^ \/{ci,C 2 )]. 

Definition 3. Up is a function on X defined for any rational interpretation I 
of a program P by: 

Up{I) = {p(i) ^ Vc/(p) (3-i(c A Ai<i<„ c*)) I 

cl{p) = (p{x) ^ C,Pi(fi), . . . ,Pn{Xn)) £ P 
Vi G [l,n] Piixi) Cj G /} 

We define the successive powers of Up as usual. It turns out that Up is monotone 
and continuous. Now let us establish a link between the meaning of a program 
P and the Up operator. First, we give a ground semantics of a rational interpre- 
tation: 

Definition 4. Let L be a rational interpretation, we define the semantics of L 
by: [/] = {p{d) \ p{x) c{x) G I, d G D^, |=^ c(d)} where is the domain of 
computation. 

For any interpretation /, we have: Tp{[I]) C [?7p(/)]. Now, as a fixpoint / of Up 
verifies Tp{[I]) C [Up{L)] = [/], we get: any fixpoint of Up is a model of P. 

For CLP(;B), we set \J{ci,C 2 ) = ci V C 2 . Hence we have Tp([/]) = [Up{L)] 
which justifies the use of Up for computing the least boolean model (= lfp(C/p)) 
of P. Figure n presents an algorithm for the Up operator. 

3.2 Widenings 

For CLP(Q), we set \J{ci,C 2 ) = convex— hull(c\, C 2 ) . Computing lfp([/p) di- 
verges in the general case. Therefore a widening operator (v) [E| is required to 
enforce convergence at the expense of precision. cTI uses widening only for com- 
putations in CLP(Q) since the least boolean model is finitely reached. However, 
we coded a generic fixpoint calculator for both CLP(Q) and CLP(S) [I9i24l34j . 
A simple widening can be found in m- We use an equivalent definition m- 
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function Up(l) : J 

Require: 7, a rational interpretation of P 
Ensure: J = Up{l), a rational interpretation of P 

1: J^0 

2: for all clause p{x) ^ c,pi{xi), . . . ,Pn{xn) G P do 

3: for i = 1 to n do 

4: let pi{xi) ^ Ci G 7 

5: end for 

6: let p{x) c' G J 

7: c ^ V(c', 3-s(c A Cl A . . . A c„)) 

8: J <— update{J,p{x) c)) 

9: end for 
10: return J 



Fig. 1. Up, a Tp-Like Operator. 



Definition 5. Let S\ and S 2 be two sets of linear inequalities defining two poly- 
hedra in Q". Then: S\ V S 2 = {fi & S\ \ S '2 /?} 

Fig. 121 presents an algorithm for successive iterations of Up until it reaches a 
fixpoint. In cTI’s current implementation prec is set to 3 for CLP(Q). 

It remains to be shown that 7„ = ite_Up (prec) is a model of P. First, note 
that, by induction on k, Ik C Ik+i ■ So we have in fact an equality when we reach 
line 11 for the last time: In = In-i- Then the last assignment for 7„ is either 
line 7 if n < prec. In this case, we have = Up{In) hence Tp([7„]) C [7„]. Or 
the last assignment for 7„ is line 9: 7„ <— InT/U p{In) O Up (In) by definition of 
any widening operator V. But we know that Tp([7]) C [Up{I)] for all 7. Again, 
Tp{[In]) C [In]. 

3.3 Optimizations 

Since the fixpoint computation engine is used twice, generic optimizations appli- 
cable to both have a substantial effect. The current optimization takes all unit 
clauses defining the predicate symbols of the analyzed see into account in a single 
pass and then processes only the non-unit clauses of the see. Table ^ compares 
the runtime of the non-optimized version of Up and the optimized version. Note 
that we also replace the union operator \J of line 7 of the algorithm presented 
in Fig. Q by a convex hull (in both versions for CLP(Q), opt and nopt), which 
can be easily coded via projection in CLP(Q) using a technique first described 
in 13, and later in |3. 

4 Computing Level-Mappings 

One key concept in many approaches for termination lies in the use of level map- 
pings, i.e., mappings from ground atoms to natural numbers. Moreover, we are 
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function ite_Up(prec) : I„ 

Require: prec, a non-negative integer 

Ensure: a rational interpretation such that Ifp(Up) C /„ 

1: n^O 
2 : ^ 0 
3: repeat 
4: /^Up(/„) 

5: n ■<— n + 1 

6: if n < prec then 

7: I„^ I 

8: else 

9: I„ ^ In-iVl 

10: end if 

11: until In C In-i 

12: return In ; 



Fig. 2. An algorithm to finitely reach a super set of Ifp(Up). 



interested here in valid level mappings, which decrease at each recursive call. cTI 
uses an improvement of an already known technique for their automatic gener- 
ation. K. Sohn and A. Van Gelder described in 1991 HH an algorithm (SVG) 
based on linear programming which ensures the existence of valid linear level 
mappings. This method, despite its power, does not seem to have attracted the 
attention of researchers interested in automating termination analysis. Hence, 
we recall it after some preliminaries. Then, the remaining subsections propose 
an extensions to SVG. 



4.1 Preliminaries 

We consider pure GLP(IN) programs, with three predefined symbols for con- 
straints: =, >, and < and their standard meaning. Those programs are abstrac- 
tions of (constraint) logic programs using (fixed or inferred) norms. We assume 
that clauses are written in flat form: po(xo) ^ co,pi(aTi), ci, . . . , c/_i,p;(ai;), cj, 
with i ^ j ^ XiC\ Xj = % (where 0 denotes the empty set) . In this presentation 
we will only consider the case of directly recursive predicates. cTI itself is also 
able to treat the general case. We write p/n as a shorthand for a predicate sym- 
bol p with arity{p) = n. Note that we frequently switch to GLP(Q’^) as some 
computational problems in this structure are much cheaper (e.g. satisfiability). 
There is clearly a loss in the precision of the analysis: results are correct but 
not complete. From now on, we write GLP for GLP(IN) or GLP(Q'''). Section 
El showed how we can compute a model Mp for a GLP program P, where each 
predicate p{x) is defined as a (finite) conjunction Cp{x) of GLP constraints. We 
use this model to simplify the program P. 
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Table 1. Impact of the Optimization on the Analysis Times. 



times in [s] 


PostiN 


Posts 


Programs 


opt 


nopt 


gain 


opt 


nopt 


gain 


ANN 


1.07 


1.33 


20% 


0.46 


0.65 


29% 


BID 


0.17 


0.27 


37% 


0.09 


0.14 


35% 


BOYER 


2.55 


3.48 


27% 


0.25 


0.48 


48% 


BROWSE 


0.37 


0.35 


-5% 


0.12 


0.15 


20% 


CREDIT 


0.12 


0.19 


37% 


0.07 


0.13 


46% 


MINISSAEXP 


2.98 


2.74 


-9% 


0.73 


1.17 


38% 


PEEPHOLE 


1.24 


1.35 


8% 


0.47 


0.63 


25% 


PLAN 


0.13 


0.20 


35% 


0.09 


0.13 


31% 


QPLAN 


1.52 


1.88 


19% 


0.62 


0.86 


28% 


RDTOK 


1.22 


0.70 


-74% 


0.26 


0.29 


10% 


READ 


1.05 


1.26 


17% 


0.34 


0.50 


32% 


WARPLAN 


0.82 


0.96 


15% 


0.27 


0.37 


27% 


average 






11% 






31% 


min. 






-74% 






10% 


max. 






37% 






48% 



Definition 6. Let Mp be a model of the CLP program P. The definition of a 
predicate p is simplified with respect to Mp when, for the clauses defining p/n, 
we add to the right of each predicate q{x) (including p) its model Cq{x) relative 
to Mp. Moreover, those predicates q/m ^ pjn which appear in the bodies are 
replaced by true (e.g. the dummy constraint 0 = OJ. Hence we end with a finite 
set of CLP clauses of the form: p(xo) <— co,p(xi),ci, . . . ,ci-i,p(xi),ci. The 
simplified program is denoted . 

We are interested in the automatic discovery of valid linear level mappings 
in such a program. We therefore give definitions for the required notions. 

Definition 7. Let p/n be a recursive predicate symbol of a CLP program P. A 
linear level mapping pL for p{xi , . . . ,Xn) is a linear relation where the 

coefficients yii are non-negative integers. 

Such linear level mappings should satisfy a property ensuring their usefulness 
for left-termination: 

Definition 8. A linear level mapping /i for p is valid with respect to 
if for each recursive clause p{xo) ^ co,p{xi),ci, . . . ,ci-i,p{xi),ci defining p in 
for k = 0 to I — 1, we have /\i^QCi > 1 + T^Xk+i, where 

denotes the transpose of the vector /i. 

4.2 The Algorithm SVG 

Let us first quickly review the algorithm of Sohn and Van Gelder. It aims at 
checking the existence of one valid linear level mapping. SVG starts with a pure 
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CLP program P and a constrained goal. A top-down boundedness analysis (see 
EM I reveals the calling modes of each predicate. Arguments are detected as 
either bounded (denoted b) or unbounded (u). A CLP model M is computed 
and P is simplified to Then SVG examines each recursive procedure 

p/n in turn (the precise order does not matter). Let us symbolically define the 
level mapping for p{x\, . . . , x„) as where = 0 if x^ 

is labelled as unbounded with respect to the calling mode oi p/n and > Q 
if Xi is labelled as bounded. Each clause Vi is processed. For one such clause, 
say p(aio) co,p(aii), ci, . . . , C;_i,p(a;/), c;, I simplified rules (for k = 0 to k = 



I — 1) are constructed: p{xo) 



. Cj 



p{xk+i)- One can assume that the 



Ao<i<fe ' 

constraint Ci^k = Ao< j<k O satisfiable, already projected onto io Uifc+i, only 
contains inequalities of the form <, and implies io > 0 and Xk+i > 0. Such a 
simplified rule gives rise to the following (pseudo-)linear programming problem 



minimize 9 = {xq — Xu) subject to Ci^k 



( 1 ) 



A valid linear level mapping p exists (at least for this recursive call of this clause) 
if 0* > 1 where 9* denotes the minimum of the objective function. Unfortunately, 
because of the symbolic constants p, o is not a linear programming problem. 
The clever idea of Sohn and Van Gelder is to consider its dual form: 



maximize rj = y/3 subject to y > 0 A yA > {p, —p) (2) 

where fi and A are automatically derived while switching to the dual form of 
m- By duality theory (see jUJI for instance), we have 9* = rf ■ Now, the authors 
observe that p appears linearly in the dual problem (it is not true for ([Q) 
because no pi appears in A. Hence (0 can be rewritten, by adding 77 > 1 and 
p’’ > 0A/r“ = 0, as Sij, a set of linear inequations. If the conjunction Sp = AijSij 
for each recursive call and for each clause defining p/n is satisfiable, then there 
exists a valid linear level mapping for p/n. 



4.3 An Extension of SVG 

Instead of checking satisfiability of Sp, we can project it onto p (we do not need 
the top-down boundeness analysis explained in subsection 14.21 all arguments 
are assumed bounded). Hence we get in one constraint all the valid linear level 
mappings. It remains to compute the maximal elements of 77^(5^), given the 
partial order: p^ A p"^ if Vi G [1, n] pj ^ 0 ^ p"/ ^ 0. 

Example 1. For app/3, let p{x,y,z) = ax + by + cz. We have IIp{Sp) = {a-|-c> 
1}. There are two maximal elements: p^{x,y,z) = x and p^{x,y,z) = z. 

In some sense, given a model for a program, this extension is complete in 
CLP(Q’^) (mainly because there are complete algorithms for linear program- 
ming and projection). But a more precise model can lead to more maximal 
elements. Hence the precision of the inferred CLP model is important. From an 
implementation point of view, this algorithm heavily relies on the costly projec- 
tion operator. In our experience a good strategy is to project constraints as soon 
as possible. 
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5 Running cTI 



5.1 Standard Programs from the Termination Literature in LP 

Tables 0 and 0 presents timings and results of cTI using some standard LP 
termination benchmarks, where the following abbreviations mean: 



— program: the name of the analyzed program. We add a star to some names to 
pin-point that we manually tune cTI, by selecting the list-size norm (instead 
of the term-size norm) with prec = 4 (instead of prec = 3) (see Section 1,3. 2 jl : 

— top-level predieate: the predicate of interest; 

— Others: checked: the class of queries checked by the analyzers of ji 0y20l43j : 

— result: the best result (y > n) among jlHI2HI43] : 

— cTI: inferred: the termination condition inferred by cTI (1 means that any 
call to the predicate terminates, 0 means that cTI can not find a terminating 
mode for that predicate); 

— cTI time: the running time for cTI to infer termination conditions. 



The Mergesort Mystery Reconsidered. For mergesort (and similarly for merge- 
SORT_Ap), we encountered a well known problem that has been solved using 
additional program transformations in m- The actual problem is split/ 3 that 
splits a list (first argument) in two sublists (second and third argument) whose 
lengths are almost equal. 

splitC [],[],[]). 

splitC [ElX] , [ElY] ,Z) split (X,Z,Y) . 

Using the term-size norm, cTI obtains the following models depending on the 
precision of the numeric abstract interpreter However, these models are not 
strong enough for inferring termination. We note that the result obtained for 
prec = 3 is already the best interpretation possible with term-size. The fluctua- 
tion of the size of the list elements blurs any finer measuring. Evidently, there is 
no chance for improvement as long as we stay with the term-size norm and the 
(unmodified) original program. 



prec < 2 : post^^^-ix, y, z) = true 
prec > 3 : post^ii^{x, y,z)=x = y + z 

However, by switching to the list-size norm defined below we obtain a suffi- 
ciently precise model for proving termination. Note that the missing information 
is the fact that the length of the two split lists differ at most by one. Therefore, 
both elements decrease if there are at least two elements in x. 

f 1 + 1 1 '*^11 list-size if ^ = [■^1'*^] 

II ^11 list-size = \ ^ if t is a variable 

I 0 otherwise 
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Table 2. Programs from \2‘2W2\ . cTI 0.40, Athlon 750 MHz, 256Mb, SICStus 

3.8.4. 





Others 




cTI 




program 


top-level predicate 


checked 


result 


inferred 


time[s] 


PERMUTE 


permute(x,y) 


X 


yes 


X 


0.15 


DUPLICATE 


duplicate(x,y) 


X 


yes 


X V y 


0.05 


SUM 


sum(x,y,z) 


X Ay 


yes 


X V y V z 


0.18 


MERGE 


merge(x,y,z) 


X Ay 


yes 


(x A y) V z 


0.26 


DIS-CON 


dis(x) 


X 


yes 


X 


0.24 


REVERSE 


reverse(x,y,z) 


X A z 


yes 


X 


0.08 


APPEND 


append(x,y,z) 


X Ay 


yes 


X V z 


0.09 


LIST 


list (x) 


X 


yes 


X 


0.01 


EOLD 


fold(x,y,z) 


X Ay 


yes 


y 


0.10 


LTE 


goal 


1 


yes 


1 


0.13 


MAP 


map(x,y) 


X 


yes 


I V y 


0.09 


MEMBER 


member(x,y) 


y 


yes 


y 


0.03 


MERGESORT 


mergesort(x,y) 


X 


no 


0 


0.43 


MERGESORT* 


mergesort(x,y) 


X 


no 


X 


0.57 


MERGESORT_AP 


mergesort _ap (x , y, z ) 


X 


yes 


Z 


0.79 


MERGESORT_AP* 


mergesort _ap (x , y, z ) 


X 


yes 


xM z 


0.92 


NAIVE_REV 


naive_rev(x,y) 


X 


yes 


X 


0.12 


ORDERED 


ordered(x) 


X 


yes 


X 


0.04 


OVERLAP 


overlap (x,y) 


X Ay 


yes 


X Ay 


0.05 


PERMUTATION 


permutation(x,y) 


X 


yes 


X 


0.15 


QUICKSORT 


quicksort (x,y) 


X 


yes 


X 


0.39 


SELECT 


select (x,y,z) 


V 


yes 


y\! z 


0.08 


SUBSET 


subset (x,y) 


X Ay 


yes 


X Ay 


0.09 


SUBSET 


subset (x,y) 


y 


no 


X Ay 


0.09 


SUM 


sum(x,y,z) 


Z 


yes 


yV z 


0.12 



prec < 2 : post^y^^ix, y, z) = true 

prec = 3 : post^yy{x, y,z) = x = y + z 

prec > 4 : post^y^{x, y,z) = x = y + zA0<y — z<l 



5.2 Middle-Sized Programs 

Table El presents timings of cTI using some standard benchmark^ from the LP 
program analysis community. We have chosen twelve middle-sized well-known 
logic programs. Almost all the programs are taken from [S] except credit, 

® collected by Naomi Lindenstrauss, http : //www. cs .huj i . ac . il/'naomil and also 
available at http://www.complang.tuwien.ac.at/cti/bench 
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Table 3. Pliimer’s Programs cTI 0.40, Athlon 750 MHz, 256Mb, SICStus 

3.8.4. 



program 


Plii 

top-level predicate 


mer 

checked 


result 


cTI 

inferred 


time[s] 


PLl.l 


append(x,y,z) 


X Ay 


yes 


xV z 


0.08 


PLl.l 


append(x,y,z) 


z 


yes 


xy z 


0.08 


pl1.2 


perm(x,y) 


X 


yes 


X 


0.16 


PL2.3.1 


p(x,y) 


X 


no 


0 


0.01 


PL3.5.6 


pW 


1 


no 


X 


0.05 


pl3.5.6a 


pW 


1 


yes 


X 


0.06 


PL4.0.1 


append3(x,y,z,v) 


X Ay A z 


yes 


xAyVxAv 


0.10 


PL4.4.3 


merge(x,y,z) 


X Ay 


yes 


xAyVz 


0.26 


pl4.4.6a 


perm(x,y) 


X 


yes 


X 


0.12 


PL4.5.2 


s(x,y) 


X 


no 


0 


0.17 


PL4.5.3A 


P(x) 


X 


no 


0 


0.01 


PL6.1.1 


qsort(x,y) 


X 


yes 


X 


0.39 


PL7.2.9 


mult(x,y,z) 


X Ay 


yes 


X Ay 


0.21 


pl7.6.2a 


reach(x,y,z) 


X Ay A z 


no 


0 


0.14 


pl7.6.2b 


reach(x,y,z,t) 


X Ay A z At 


no 


0 


0.22 


pl7.6.2c 


reach(x,y,z,t) 


X Ay A z At 


yes 


z At 


0.29 


PL8.2.1 


mergesort(x,y) 


X 


no 


0 


0.43 


PL8.2.1* 


mergesort(x,y) 


X 


no 


X 


0.58 


pl8.2.1a 


mergesort(x,y) 


X 


yes 


X 


0.47 


MERGESORT.T 


mergesort(x,y) 


X 


yes 


X 


0.94 


PL8.3.1 


minsort(x,y) 


X 


no 


X Ay 


0.26 


pl8.3.1a 


minsort(x,y) 


X 


yes 


X 


0.24 


PL8.4.1 


even(x) 


X 


yes 


X 


0.13 


PL8.4.2 


e(x,y) 


X 


yes 


X 


0.52 



PLAN and MINISSAEXP. Table El describes them, where the following abbrevi- 
ations mean: 

— lines is the number of lines of the Prolog program in pure form (e.g. no 
disjunction), with one predicate symbol per line and no blank line; 

— facts and rules denote, respectively, the numbers of facts (unit clauses) and 
rules (non-unit clauses) in the program; 

— secs gives the number of strongly connected components (sees, i.e. cycles of 
mutually recursive predicate symbols) in the call graph; 

— length denotes the number of predicate symbols in the longest cycle in the 
call graph; 

— vars denotes the sum of the arities of the predicate symbols of the longest 
cycle in the call graph. 

The first five columns of Table El indicate the time for computing: 

— a model PostjN (section El ; 

— the constraint defining the level mapping /r (section ^ ; 
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Table 4. Informations about Analyzed Programs. 



Program 


lines 


facts 


rules 


sees 


length 


vars 


ANN 


571 


101 


99 


44 


2 


7 


BID 


108 


24 


26 


20 


1 


4 


BOYER 


275 


63 


78 


25 


2 


5 


BROWSE 


107 


4 


29 


15 


1 


6 


CREDIT 


108 


33 


24 


24 


1 


4 


MINISSAEXP 


870 


42 


237 


101 


5 


17 


PEEPHOLE 


322 


72 


80 


11 


2 


5 


PLAN 


64 


12 


17 


16 


1 


4 


QPLAN 


403 


63 


87 


38 


3 


11 


RDTOK 


285 


7 


57 


12 


4 


12 


READ 


299 


15 


75 


17 


7 


33 


WARPLAN 


304 


43 


68 


33 


3 


14 



— the concrete level mapping; 

— the least model Posts', 

— the boolean termination conditions. 

The timings are minimum execution times over ten iterations. Next we give: 

— the total runtime (including all syntactic transformations); 

— the speed of the analysis (the average number of analyzed lines of code in 
one second); 

— the quality of the analysis, computed as the ratio of the number of relations 
which have a non-empty termination condition over the total number of 
relations. 

Let us comment on the results of Table 0 

The speed of the analysis is surprisingly slower for peephole than for the 
other programs. A more careful look on its code shows that its call graph contains 
5 cycles of length 2, which slow down the computation of the constraints defining 
the level mapping. 

We note that cTI was able to prove that bid, credit, and plan are left- 
terminating (see PI, every ground atom left-terminates). For any such program 
P, Tp has only one fixpoint (P(, Theorem 8.13), which helps proving partial cor- 
rectness. Moreover, the ground semantics of such a program is decidable (Prolog 
is the decision procedure!), which helps testing and validating the program. 

On the other hand, when the quality of the analysis is less than 100%, it 
means that there exists at least one see where the inferred termination condition 
is 0. Such sees are clearly identified, which may help the programmer. Here are 
some reasons why cTI may fail: potential non-termination, unknown predicate 
in the code (assumed to be non-terminating by cTI), poor numeric model, non- 
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existence of a linear level mapping for a predicate, inadequate norm, etc. Also, 
the analysis of the sees which depend on a failed one is likely to fail. 



Table 5. Middle-Sized Programs, cTI 0.40, Athlon 750 MHz, 256Mb, SICStus 

3.8.4. 



times in [s] 
program 


PostiN 


C 




Posts 


TC 


total time 


lines/sec 


quality % 


ANN 


1.07 


2.62 


0.17 


0.46 


0.13 


5.01 


114 


48 


BID 


0.17 


0.33 


0.03 


0.09 


0.04 


0.79 


136 


100 


BOYER 


2.55 


0.36 


0.03 


0.25 


0.05 


3.53 


78 


85 


BROWSE 


0.37 


1.01 


0.08 


0.12 


0.03 


1.81 


59 


60 


CREDIT 


0.12 


0.18 


0.04 


0.07 


0.03 


0.61 


177 


100 


MINISSAEXP 


3.51 


6.02 


0.73 


1.10 


0.35 


12.33 


70 


68 


PEEPHOLE 


1.24 


9.78 


0.14 


0.47 


0.12 


12.08 


27 


93 


PLAN 


0.13 


0.32 


0.03 


0.09 


0.03 


0.71 


90 


100 


QPLAN 


1.52 


4.32 


0.23 


0.62 


0.16 


7.3 


55 


68 


RDTOK 


1.22 


0.95 


0.07 


0.26 


0.05 


2.92 


98 


44 


READ 


1.05 


5.25 


0.03 


0.34 


0.16 


6.87 


44 


52 


WARPLAN 


0.82 


1.67 


0.03 


0.27 


0.03 


3.18 


96 


36 


mean 


23% 


54% 


2% 


7% 


2% 


100% 


87 


71% 



6 Conclusion 

We have presented the main algorithms of cTI, our bottom-up left-termination 
inference tool for logic programs and given some running times for standard LP 
termination programs and middle-sized logic programs. The analysis requires 
three fixpoint computations and the inference of well-founded orders. We have 
described some optimizations and measured their impacts. 

We have compared the quality of the results obtained by cTI with three 
other top-down termination checkers (we point out that the system Ciao-Prolog 
13 adopts another approach for termination, based on complexity analysis 1171 1. 
Our termination inference tool is able in all the example^ presented above to 
infer as least as large a class of terminating queries (although we manually tuned 
cTI three times). On the other hand, the running times of cTI are also larger, 
but termination inference is a much more general problem than termination 
checking. In the worst case, an exponential number of termination checks are 
needed to simulate termination inference. 

Right now, cTI cannot directly infer termination for some programs, e.g. 
CHAT, as suggested by P. Tarau. A more detailed look to this program written 
by F.C.N. Pereira and D.H.D. Warren shows that it contains one see of 30 mu- 
tually recursive predicate symbols with 8 arguments per predicate symbol on 

^ TermiLog is sometimes able to prove termination whereas cTI is not, and vice versa. 
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the average. We cannot compute a numeric model for chat using the constraint 
solver CLP(Q) of SICStus Prolog in reasonable time. So we add for each com- 
putation which may be too costly (see also cni) a timeout and if necessary we 
are able to return a value which does not destroy the correctness of the analysis 
(this is another widening!). The point is that the theoretical framework only 
requires to have a CLP(IN) model and an upper approximation of the CLP(S) 
least model. The drawback of this approach is that, in such a case, the quality of 
the inference is deteriorated. As a side effect, the running time of cTI is now lin- 
ear with respeet to the number of secs in the call graph. We point out that chat 
is one of the very few examples we know which requires such a mechanism (F. 
Henderson notified us of a similar see in the code of the Mercury compiler H2|). 
Finally, Table0 points out that most (> 75%) of the analysis time lies in numeric 
computations. As a last resort we might consider to use specialized C libraries 
for polyhedra manipulations and a simplex solver optimized to projection. 

We are also developing another line of research where we try to prove the 
optimality of the termination conditions computed by cTL Instead of looking for 
general classes of logic programs for which the analysis is complete, we try, for 
each particular (pure) logic program, to prove that the termination condition de- 
rived by cTI is as general as it can be (with respect to the language describing the 
termination conditions and independently from the analysis). We have already 
implemented a prototype of the analyzer (called nTI for non- Termination Infer- 
ence, available at the same URL than cTI) and its formalization is in progress. 

Acknowledgements: We thank the readers of this paper for their useful com- 
ments. 
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Abstract. It is often useful to introduce probabilistic behavior in pro- 
grams, either because of the use of internal random generators (proba- 
bilistic algorithms), either because of some external devices (networks, 
physical sensors) with known statistics of behavior. Previous works on 
probabilistic abstract interpretation have addressed safety properties, 
but somehow neglected probabilistic termination. In this paper, we pro- 
pose a method to automatically prove the probabilistic termination of 
programs using exponential bounds on the tail of the distribution. We 
apply this method to an example and give some directions as to how 
to implement it. We also show that this method can also be applied to 
make unsound statistical methods on average running times sound. 



1 Introduction 

In this paper, we propose an analysis scheme for probabilistic programs, with 
the goal of proving probabilistic termination and other probabilistic properties. 



1.1 Goals 

It is in general difficult to automatically prove liveness properties of programs; 
nevertheless, the availability of probabilistic informations makes that task easier. 
In this paper, we shall address the cases of termination where the probability of 
taking a loop k times decreases at least exponentially with k. Such cases happen, 
for instance, in waiting loops on hardware peripherals where the probability 
that the busy peripheral will become ready over a period of time is a constant. 
However, we address cases where the constant in the exponential is unknown. 
Our analysis tries to derive automatically suitable constants for the exponential 
bound. 

An obvious application of this analysis is to prove that the probability of 
non-termination is zero. Other applications include improvements on statistical 
methods to derive average execution times. 

Our analysis is explained on block-structured programs. A reason for this is 
that the control-flow structure of a block-structured program is generally ap- 
parent from its syntactic structure, whereas converting the program to a proba- 
bilistic transition system would hide that structure. It is nevertheless possible to 
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apply the method to a probabilistic transition system, considering it as a single 
while loop, although the results may not be very satisfactory. 

As usual with abstract interpretation, our analysis is sound, that is, any result 
it gives is proven to be correct (provided that the implementation of the analysis 
is correct, of course). On the other hand, because of the essential undecidability 
of non-trivial program properties, our analyzer is forced to give non-optimal 
results. Careful experimentation and heuristics may then improve the analysis. 



1.2 Related Works 



Several analyses have been proposed to prove the termination of non-probabilistic 
programs. Generally speaking, proving the termination of a program is done by 
showing that some value taken in a well-founded ordered set decreases strictly 
as the program proceeds [7]. The problem is to detect the appropriate value 
and the appropriate order. We follow this general principle by showing that the 
probability that the loop count or execution time reaches k is bounded by a 
strictly decreasing function of k whose limit is 0; the choice of an exponential is 
natural since the distribution we wish to approximate is exactly exponential in 
some simple cases (where the control graph of the program is a Markov chain, 
as in §6.1). 

We use the framework of abstract interpretation of probabilistic programs 
defined in our earlier work [8]. However, that work did not address the problem 
of probabilistic termination except perhaps in its most trivial cases (where there 
exists a constant number of iterations after which the program always terminate, 
regardless of the inputs) . Here, we address a specific case (loops whose iteration 
count or total execution time follow a distribution bounded from above by a 
decreasing exponential) that was not addressed by our previous methods, yet is of 
common practical interest. Nevertheless, the analysis described in this paper can 
be “mixed” with the analysis described in [8]; both are likely to be implemented 
in the same system. 



1.3 Overview of the Paper 



In section 2, we explain the probabilistic concrete semantics that we analyze. In 
section 3, we explain how to apply abstract interpretation to such a semantics. 
In section 4, we give a first abstract domain able to express properties such as 
the exponential decrease of the tail of the distribution of an integer variable. 
In section 5, we explain how to build a more complex, but much more precise 
domain using elements of the former domain as building blocks. In section 6, 
we show on a simple example what kind of results the analysis achieves, and 
we explain how such an analysis can improve the mathematical soundness of 
experimental average time estimations. 
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2 Concrete Semantics 

We take the same concrete semantics as in earlier papers [5,6,8]. For the sake 
of clarity and self-containedness, we shall summarize them here. 

We shall express probabilities using measures [11, §1.18]. We shall begin by 
a few classical mathematical definitions. 

2.1 Measures 

The basic objects we shall operate on are measures. 

— T(A1) is the power-set of X. 

— A a -algebra is a subset of J’(Al) that contains 0 and is stable by countable 
union and complementation (and thus contains X and is stable by countable 
intersection) . 

— A set X with a cr-algebra ax defined on it is called a measurable space and the 
elements of the cr-algebra are the measurable subsets. We shall often mention 
measurable spaces by their name, omitting the cr-algebra, if no confusion is 
possible. 

— If X and Y are measurable spaces, f : X ^ Y is a, measurable function if 
for all W measurable in Y, f~^{W) is measurable in X. 

— A positive measure is a function p, defined on a cr-algebra ax whose range 
is in [0, oo] and which is countably additive, p is countably additive if, tak- 
ing (A„)„gN a disjoint collection of elements of ax, then /c (U[]TgA„) = 

To avoid trivialities, we assume p{A) < oo for at least one A. 
The total weight of a measure p is p{X). p is said to be concentrated on 
A C X if for all B, p{B) = p{B n A). We shall note M+(A1) the positive 
measures on X. 

— A probability measure is a positive measure of total weight 1. A cr -finite 
measure is a measure p so that there exists a countable partition X = 

so that p{Xk) < oo for all k. This is a technical condition that will 
be met every time in our cases. 

— Given two cr-finite measures measures p and p' on X and X' respectively, we 
note p® p' the product measure [11, definition 7.7], defined on the product 
cr-algebra ax x crx' . The characterizing property of this product measure is 
that p ® p'{A X A') = p{A).p' {A') for all measurable sets A and A'. 

Our semantics shall be expressed as continuous linear operators between 
measure spaces. 

2.2 Semantics of Arithmetic Operators 

Let us consider an elementary program statement c so that fcj : X —>■ Y, X and 
Y being measurable spaces. We shall also suppose that |c] is measurable, which 
is a purely technical requirement. 

To |c] we associate the following linear operator IcJ^: 

M+(A)^M+(T) 

^ ^P' p ^XW-pilcj-^ {W)) ■ 
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2.3 Random Inputs or Generators 

An obvious interest of probabilistic semantics is to give an accurate semantics 
to assignment such as x:=random() where random () is a function that, each 
time it is invoked, returns a real value uniformly distributed between 0 and 
1, independently of previous calls d We therefore have to give a semantics to 
constructs such as x:=random() ;, where random returns a value in a measured 
space R whose probability is given by the measure and is independent of all 
other calls and previous states. 

We decompose this operation into two steps: 



|x : =random() J 






Ip : = random() J 



iX X R\ 



|x;=p1 






The second step is a simple assignment operator, addressed by the generic se- 
mantics for arithmetic operators. The first step is a product of measures: 



|p:=randomO] 



Xp^{XxR)p 
fx ^ Hr 



( 1 ) 



In the rest of the paper, when dealing with random generators, we shall focus 
on this product operation. 



2.4 Tests and Loops 

We restrict ourselves to test and loop conditions b so that |b] is measurable. 
|b] is the set of environments matched by condition b. It is obtained inductively 
from the set of environment matched by the atomic tests (e.g. comparisons): 

- |6i or 62I = I^il U I62I 

- |6i and 62] = |6i] n I62I 

- |not 6] = |61'^ 

The semantics of tests is: 

|if c then Ci else e2lp{H) = bilpO + le2jpO (2) 

^ Of course, functions such as the POSIX C function drand48() would not fulfill such 
requirements, since they are pseudo-random generators whose output depends on an 
internal state that changes each time the function is invoked, thus the probability 
laws of successive invocations are not independent. However, ideal random generators 
are quite an accurate approximation for most analyses. 
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where = XX.fj,{X n W). The semantics for the while loop is: 

OO 

[while c do e]p (/i) = ^ o (|e]p o (/)|c])”(m) 

n— 0 

( OO 

n^O 

= ° (m'))"(a^-0)) (3) 

Limits and infinite sums are taken according to the set- wise topology [3, §111.10]. 
We refer the reader to the extended version of [8] for the technical explanations 
on continuity and convergence. 

3 Abstract Semantics 

We first give the vocabulary and notations we use for abstractions in general 
(the reader is invited to consult [2] for further details). We then explain the 
particular treatment of probabilistic semantics. 

3.1 Summary of Abstraction 

Let us consider a preordered set and a monotone function jx '■ A« ^ T(A). 
x'^ G X'^ is said to be an abstraction of C X if x^ C 7 x(a;*). yx is called 
the concretization function. The triple (J’(A), A**, yx) is called an abstraction. 
1P(A) is the concrete domain and X^ the abstract domain. Such definitions can 
be extended to any preordered set X^ besides T(A). 

Let us now consider two abstractions (1P(A), A**, yx) and (lP(y), A**, yy) and 
a function f : X ^ Y . is said to be an abstraction of f if 

Vx** G A** Va: G A a; G yx(a^**) =1^ f{x) G (4) 

More generally, if (A^, A**,yx) and (A^, A**,yy) are abstractions and /*’ : X^ 
is a monotone function, then /•* is said to be an abstraction of f^ if 

Va;*’ G A^ Va;** G A** a;*” C yx(a:**) f^{x^) E lx{fH^'^)) (5) 

Algorithmically, elements in A** will have a machine representation. To any 
program construct c we shall attach an effectively computable function |c]** so 
that |c]** is an abstraction of |c]. Given a machine description of a superset of 
the inputs of the programs, the abstract version yields a superset of the outputs 
of the program. If a state is not in this superset, this means that, for sure, the 
program cannot reach this state. 

Let us take an example, the domain of intervals: if X'^ = Y'^ = where 
T = {(a, 6) G Z U {— OO, -boo} | a < 6} U {T|, y(a, 6) = {c G Z | a < c < 6} 
and y induces a preorder Cy over T and, point-wise, over A**, then we can take 

|x . — y+zj ((Ux , ^x) j (^y : by') y (Uz , ^z)) — ((^y T : by bz) y (Uy 5 by') , (g-z , ^z)) ■ 
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3.2 Turning Fixpoints of AfRne Operators into Fixpoints of 
Monotone Set Operators 

Equation 3 shows that the semantics of loops are given as infinite sums or, 
equivalently, as fixpoints of some affine operators. In non-probabilistic semantics, 
the semantics of loops is usually the fixpoint of some monotone operator on the 
concrete lattice, which get immediately abstracted as fixpoints on the abstract 
lattice. The approximation is not so evident in the case of this sum; we shall 
nevertheless see how to deal with it using fixpoints on the abstract lattice. 

Defining recursively, as follows: /io = AA.O and /i„+i = with 

4>{v) = M + I^lp o we can rewrite equation 3 as [while c do ej^ (/i) = 

(^|gjc(lim„^oo Mn)- We wish to approximate this limit in the measure space by 
an abstract element. 

We get this approximation by finding a subset L of 1P(M+(A)) that: 

~ contains any element of the sequence (/r„), 

— is stable by tp, 

— is topologically closed. 

Then lim„^oo Mn G L. 

Let us take N G N. Let us note +** an abstraction of the sum operation on 
measures, 0** an abstraction of the null measure and = W** + |e]pO(/)|^j(i/#). 

Let us take G WK By abstraction, fiN G 7(V'“^(0l‘)). 

Let us suppose that we have an “approximate least fixpoint” operation Ifp** : 
{X'^ ^ gy “approximate least fixpoint” we mean that if (p^ is 

an abstraction of (p, then Ifp** (p'^ is an abstraction of Ifp (p. The next sub-section 
will explain how to implement such an operation using widening operators. 

Let us consider L — y ^Ifp** X'^ ^ 'tp'^^ U {W'^ -I-** |e]** (^|^j(A)))^ . L con- 
tains /iAT and is stable by ip. 

We can therefore take: 

[while c do el“ (W«) = <p[^^c (lfp“ U (VF# +« [el“ (<^|,j(A)))) , 

( 6 ) 



3.3 Approximation of Least Fixpoints 

As noted before, we need an “approximate least fixpoint” operation Ifp** : (A** 
monotonic, ^ shall See here that such an operation can be imple- 

mented using widening operators ([2]; see §4.4, 5.4 for the widening operators 
that we propose for our particular application). 

A widening operator V : A** x A** ^ A** is such that 

— for all X**, y**, a;** U y** C a:**Vy**; 

— for any ascending sequence (y^) and any a;g, the sequence (x^) defined by 

is ultimately stationary. 
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Informally, a widening operator is a kind of “convergence accelerator” for 
ascending sequences of elements of the lattice; it allows obtaining in finite time 
an over-approximation of the limit of the sequence (which can be reached in 
infinite time). 

Let us suppose we have a monotonic (j) : X X and an abstraction of 
it (j>^ : X^ ^ XK Let us define Xq = _L and 4>^ {x\^) . Since V is a 

widening operator, x\^ is ultimately stationary. Let us call L'^ its limit. We have 
L** = L**V(()**(L**). Since V is a widening operator, C = LK Let 

us apply 7 (monotonic) to both sides of the inequality: C y(L**). Since 

(f>^ is an abstraction of (j>, (j)oj(^L^) C 7 o^t*(L**). It follows that Q q(L**). 

By Tarski’s theorem [2, §4.1], Ifpc/) = n<^(a:)iza:^ E is 

thus an upper approximation of the least fixpoint of (j>, obtained by a finite 
computation using the widening operator. 



4 Basic Abstract Domain and Abstract Operations 

We wish to represent sets of (sub) probability measures symbolically. In this 
section, we give a basic abstract domain expressing exponentially decreasing 
tails of probabilistic distributions on integers. This abstract domain is not very 
satisfactory by itself (it is very crude when it comes to loops), but is a basis for 
the definition of a more complex domain (section 5) . 

4.1 Abstract Domain 

Let V be the set of variables. Each element of the abstract domain if is a tuple 
of coefficients. Those coefficients will represent three kinds of known facts on the 
measures: 

— an upper bound W on their total weight; 

— upper bounds on the intervals of probable variation for the integer variables: 
for any variable v, the probability that v is outside [a„, 6„] is 0; of course, at 
and/or can be infinite (a„ < by); 

— for each integer variable v, some data Cy on its exponential decreasing: 
either none or a pair (ay,/3y) G M+ x [0, 1[ meaning that the probability that 
variable w is fc is bounded by ay/3y. 

lE(W,(ay,by)y^v t(Cv)v^v) IS the set of all measures matching the above 
conditions. We shall note ^{condition) for the application of the measure /i to the 
set of environments matching condition condition. The three conditions above 
then get written as: 

G ^ E (kb, {Oy , by^y^Y ^ (^Cy^y^Y ) 

{ ^(true) < W 
I Wv G V fj.{v ^ [ay, by]) = 0 

[ Vw G y Cy = (ay,f3y) V fc G Z = k) < ay(3^ 



( 7 ) 
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4.2 Arithmetic Operations 

We do not provide an abstract operator for each of the basic operations that a 
program may encounter; for instance, we say nothing of multiplication. In cases 
that are not described, we just apply interval propagation [1] and set C„ = none 
for every modified variable v. In some case, we shall provide only for some cases, 
while some others can be handled by using the symmetry of the operation and 
reverting to a described case. 

We focus on the operations that will be most useful for our analysis goals 
(number of iterations taken in a loop, number of used CPU cycles) . For instance, 
we consider the case of the arithmetic plus since it will be used to count loop 
iterations or program cycles, whereas multiplication is of little use for such tasks. 

Arithmetic Plus. We define here the abstract operation (TU, (a„, 
{Cy)y^v) ^ (kF',«,&(,)«ev.(C'(,)vey) =z := x+y# .(VF, (a„, (C'„)„gy). 

The distribution after applying an arithmetic plus obeys the following con- 
volution equation: 

(Ix+ylp •m)(z = t) = = kAy = t- k). (8) 

Let us suppose that fi G 7£;(IF, (a„, we want to produce 

(IF',(a;,5;)„6y,(C'')„gy) so that ([x+yl^ G 7 e(IF', K, (C;)„gv). 

Obviously we can take W = W, a'^ = + Uy, b'^ = by^ + by, and 6(, = and 

C' = Cy for all u yf z. 

We therefore have four cases: 

— Cx = none and Cy = none, then C' = none; 

— Cx = none and Cy = {ay,(3y). We then have fi{x = kAy = t — k) < ay(3y~’^ 
if A: G [qx, bx] y{x = kAy = t — k) = 0 otherwise. Inequality 8 then yields 
¥{x + y = t)<ay X:fc=a„ 

Let ai, = ayPy°^ ” d -i ~ particular, if bx = (variable 

X is actually a constant), then ai, = 0^/3“*'“'. 

Then ([x+yj^ .^)(z = t) < If ai, = oo, we take C' = none else we take 

C' = (a;,/30- 

— Cx = (ax,Px) and Cy = none; this is mutatis mutandis the previous case. 

— Cx = {ax,(dx) and Cy = (ay,Py); we then apply the previous cases and take 
the greatest lower bound of both. 

4.3 Random Generation 

We define here the abstract operation (IF, (a„, 6„)„gy, (C„)„gy) 

(IF', (a(,,6(,)„ev, (C'')„gy) = p := random# .(IF, (a„, 6„)„ey, (C'„)„gy). 

Let us recall that p:=randomp./i = y,® yn where ya is the distribution of the 
generator. Let us note Wr the total weight of yn (it can be less than 1, see §5.3). 
We take IF' = Wr.W, and for any variable v except p, a'y = ay and by = by] if 
Cy = {ay,(3y) then C' = (IF/{.a„, /?„), else C' = none. If the generator has an 
integer output and is bounded in [aij, bn], then a'^ = an and b'^ = bn. C'p = none. 
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4.4 Flow Control 

As with preceding operations, we only define the cases that will be actually used 
for our analysis goals. Other cases are handled by simple interval propagation 
and setting Cy to none for every modified variable v. 



Least Upper Bound. We define here the abstract operation ((W, (a„, 
(CyUv), (M^', K,KUv, {C'yUv)) ^ {w'\ {a':,b':Uv, (C'^Uv) noted as 
(W^,( (yV , {dyf by)y^Y , ). 

Given (VP, (a„, (C„)„gy) and (VP', (a^, (C' )„ey), we want (VP", 

(a", by)y^v, (Cy)vev) so that 7£;(bP, (a Vi 1 (Cy) vG v) U7 _e(VP', ( Cly , K)vev, 

{Cy)y^v) Q 1 e{W" ,{ a'yby)y,=v,{^v)vev) with little loss of precision. Let us 
take VP" = max(VP, VP') and for all v: 

— a'y = min(a„,a(,) and b" = max(6„,6(,) 

— if Ct, = (ay,f3y) and C' = (a'„,/3(,) then we take f3” = TtiaK{f3y , f3'^) and 

a” = max(a„/3“” , . 

— if Cl, = none or C' = none then C" = none. 



Widening. As noted before (§3.3), we need some widening operators to ap- 
proximate least fixpoints. 

We shall define here the abstract operation ((VP, (a„, 6„)„gy, (C„)„gy), 
{W',{a'^,b'Jyev,{C'v)vev)) ^ (VP", (a'J, 6'J)„gv, (C")„6y) noted as 
(VP, {ay,by)y(zv, (G„ ) „ g ^ ) V £; ( VP' , (a'„,6'„)„gy, (C'')„gy). 

(VP, (a„,6„)„gy,(C„)„gv)V£;(VP',(a(,,5(,)„gy,(C'')„gy) should be higher than 
(VP, {ay,by)y(zv, {Cy)y(zv) E ( VP' , (tt(,, ) „ g V , (C')„gy). We also require the se- 
quence defined inductively by to be stationary, for any sequence 

(/r5j)„gN. The main interest of such an operator is the computation of least fix- 
points (see §3.3. 

We shall use a widening operator Vr on the reals: 

— xVr?/ = oo if a; < y; 

— a;VRy = y otherwise. 

Intuitively, using this operator means that if a sequence of real coefficients keeps 
of ascending, we get an upper approximation of it using -boo. 

Let us take VP" = max(VP, VP') and for all v. 

— if a„ > a'y, a!y = — oo else a" = a„; 

— if < b'y, b'y = -boo else 6" = a„; 

— two cases: 

• If a" = -boo or Cy = none or C' = none then C" = none. 

• Otherwise, if Cy = (a„,/3„) and C' = {a'y,j3'y) then we take j3'y = 

exp(-(-ln/3„)VR(-ln/3(,)) anda" = VRa'„/3(,“” ))./3"““” . If a" < 

oo then C" = (a",/3"), otherwise C" = none. 
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Addition. We shall define here the abstract operation {{W, (o„, (C'„)„gy), 

(W',(a;,5;)„6y,(C'')„6y)) 1 -^ (W",(a",5")„6y,(C")vey) noted as 
(W, {ay, by)y^Vj {Cy)v^v) {W , (o^, by)y^Vj {Cy)v^v)- 
Two cases are of particular interest: 



- For all t G Z, ^{v = t) < t ^ [a„, by] fx{v = t), t ^ [a'y, by] =i> ^'{v = 

t) and ^'(true) < W . 

Two cases: , 

• by < a'y] then let us take a" = max(o;„, lT'./3„ ”) and (3'y = j3y, then 
H + ii{t = v) < a”Py^ (see Fig. 1 for an example); 

• otherwise, let us take a” = ay,W + and j3y = (3y, then ^ + ^{t = 
v) < a”[3y*. 




Fig. 1. Abstract addition of measures. Cx = (1,0.3), Ux = S, bx = +oo, W' = 0.7, 
C'x = none, a'x = 0, &(, = 2. For the sake of readability, the discrete distributions 
are extended to continuous ones. The near-vertical slopes are artefacts of the plotting 
software replacing vertical slopes. 



~ For alH G Z, t ^ [a„, by] =i> ^{v = t),t^ [a'y, 6 (,] ^'{v = t), where a'y > by. 

Let us take /3" = {W/W) '>'«>-!>« and a" = W./3'y^'' , then fx+^j,{t = v) < 

(see Fig. 2 for an example). 

4.5 Machine Reals in Our Abstract Domain 

Our abstract domain makes ample use of real numbers: each coefficient ay or 
j3y is a priori a real number. A possible implementation of these coefficients is 
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Fig. 2. Abstract addition of measures. Cx = none, Ux = 0, bx = I, W = 1, a'x = 2, 
h'x = 3, TF' = 0.856. 




machine reals (IEEE 754 or similar); another is rational numbers as quotients 
of arbitrary-precision integers. We discuss here the possible consequences of the 
use of machine reals for the implementation of those coefficients. 

The use of machine reals leads to two implementation problems. The first 
problem is that it is difficult to ascertain the loss of precision induced by machine 
reals throughout computations: how can the user be convinced that the output 
of the analyzer is sound? This can be worked around by using directed rounding 
modes. 

A more annoying problem, relevant in implementation, is the fact that ac- 
crued imprecisions may lead to “drift” , especially when using directed rounding 
modes. By “drift”, we mean that a sequence of real numbers that, mathemati- 
cally speaking, should appear to be stationary may be strictly ascending in the 
floating-point approximation. In that case, our widening operator on the reals 
Vr (§4.4) may jump prematurely to -boo. 

It may be desirable to consider that small changes in some real coefficients do 
not indicate that the coefficient is actually changing but rather indicate some loss 
of precision. We expect the current work on abstract domains for real numbers 
[10] to provide better solutions to that problem. 
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5 Abstract Domain of Finite Sums 

The preceding domain is not yet very suitable to handle random generation. 
In this section, we lift it to the domain of its finite sums. This is similar to [8, 
section 4] . 



5.1 Definition 

We shall define another domain S and another concretization function yg. S 
consists of finite tuples of elements of E or, more interestingly, of a reduced 
product [2, §4. 2. 3. 3] P of E and another domain D. For our subsequent ex- 
amples, we shall use for P the product of E and the lattice of real intervals 
(for real variables), yp is the concretization function for this reduced product: 
yp(e**, S) = yp(e**) n yp((i**). 

As a running example we shall use for D the lifted domain of real intervals: to 
each real variable v we attach an interval [ot,, by], with possibly infinite bounds. 
The probability that variable v is outside [a„,6„] is zero. 

Items of S are therefore finite tuples of elements of P. The concretization 
function yg is defined as follows: /r G js{Pi, ■■■,ph) if and only if there exist 
Ml G lp{Pi), Mn G lp{Pn) so that M = Ylk=i 



5.2 Arithmetic Operations 

Deterministic Operations. Basic operations are handled by linearity: since 
for any program construct P its semantics |P]p is a linear operator, it follows 

that if M G ls{p{, -,pV) then (Pj^ .fx £ yg([Pl“ .p{, ■■■, [Pl“ -pl). 

The abstract operator is therefore constructed: 



5.3 Random Operations 

Let us consider a random generator G operating according to the distribution 
Mp. The semantics of the random operation is |random]p .p = p® pn (Equ. 1). 
We shall suppose that our underlying domain P has an abstract operation (g)**. 

Let us suppose that pa = M*- Then p ® pn = P ® Pi thus if 

P G -pI . . . , 1^1* -pi) then p®pr£ yg(lPl“ .p\ p\, [Pl“ .p\ 

p%,...,lPll.pi®H\,...,lP\l.pi®H%)- 

As an example, let us consider the case of an uniform random generator in 
[0,1]. Let us “cut” it into N equal sub-segments: let us note pR the uniform 
measure on [0, 1]. pR = Pi ^ith Pi{X) = pr{X n [(i — 1)/A; i/N]). For the 

abstraction oi p^ p®Pi over our example for the domain P: (e, (di, . . . , d„)) i-^- 
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5.4 Flow Control 

[while c do ef (W^) = (lfp“ ^ U fef 

Addition. Addition is very simple: 

{Pl, ■ ■ ■ ,Pn) +“ (P'l, ■ • -,P'„>) = {Pl, ■ ■ -,Pn,p'l, ■ ■ ■ ,P'„>) (9) 



Loops. For accuracy reasons, we wish [while b do e]** .{p\, . . . to operate 
separately on each component p{, ... ,p^. We thus define an auxiliary function 
so that [while b do .{p\, ... ,pl^) = .p\, . . . , H'^ .p\,) . is defined 

following equation 6: 

i7«(TF«) = ^ merge(W« U +« [el“ (</-[,j(A)))). 

where merge(pi, . . . ,p„) = +# h** p„). 

We construct the “approximate least fixpoint operation” lfp**(/**) as follows: 
we consider the sequence Uq = (0), mJj+i = p f'^ {u'^) . All elements of the 

sequences are 1-tuples of elements of P. If Vp is a widening operator for the 
domain P, then this sequence is stationary, and its limit is an approximation of 
the least fixpoint of /**. 

Such a Vp operator can be defined as follows: (e, d) Vp(e', d') = (eV pe', dV pd') 
where Vp is the widening operator defined at §4.4. 

6 Examples and Applications 

6.1 Proof of Probabilistic Termination 

Let us consider the following program: 

double X, y; 
int k = 0 ; 
do 
{ 

X = uniform 0 +uniform 0 ; 
y = uniform 0 ; 
k++ ; 

> 

while (x < 1.4 II y < 0.2) 

uniform 0 is assumed to be a random generator uniformly distributed in 
[0, 1]. In this simple case, the probability that the loop is taken is independent 
of the input data and of the number of iterations done so far. Furthermore, we 
can compute it mathematically (0.856), with our knowledge of the distribution 
of the random variables and the computation of an integral. 
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Let us examine here what happens if we apply the above method, dividing the 
random generator into N = 20 sub-segments of equal length (§5.3). At the end of 
the first iteration, after the merge, the analyzer establishes that the probability 
of reaching 1 iteration is less than 0.8805.^ Applying the last case of §4.4, we 
obtain Ok = 1, 6k = 2, /?k = 0.8805, Ok « 1.1357. After another iteration, we 
obtain Ok = 1, 6k = 3, /3k = 0.8805, Ok ~ 1.1357. The analyzer widens to Uk = 1, 
6k = oo, /3k = 0.8805, Ok ~ 1.1357, which is stable. 

We have therefore established that the probability that k = a; at the begin- 
ning of the body of the loop is less than 0.8805^“^. That is of course not as good 
as the exact computation, but still offers a reasonable bound. 



6.2 Statistical Inference of Average Running Times 



It is often needed to derive statistical facts such as average running times for 
real-time systems. Results achieved by the preceding method can be too rough 
to give precise estimates; nevertheless, they can help in making mathematically 
accurate some experimental statistical results. 

Intuitively, to get the average running time of a system, we should take a large 
number n of random samples (a:fc)i<fc<n for the input (distributed according to 
the supposed or inferred distribution on the inputs of the system). Let us call 
R{x) the running time on input x; we then hope to have an approximation of 
the means R by taking i R{^k)- 

Unfortunately, this method is not mathematically sound, essentially because 
R is not bounded, or rather because we have no information as to the very large 
values of R (the “tail” of the distribution). For instance, let us suppose that R 
is 0 for 99.99% of the input values and V for 0.01% of the input values. With 
90% probability, the experimental average obtained by the Monte-Carlo method 
using 1000 samples will be 0, whereas the correct value is U/ 10000. As V can 
be arbitrarily large, this means we cannot gain any confidence interval on the 
result. 

On the other hand, if, using the analysis described in this paper, we can show 
that the tail of the distribution decreases exponentially, we can get a confidence 
bound. Let us note P (A) the probability of an event A. Let us suppose that our 
analysis has established that the probability of running a loop at least k times 
is less than a.f3^. Let N > 1. Let us run the program n times, stopping each 
time when either the loop terminates or it has been executed N times. Let us 
consider the random variable R where R{x), the number of iterations the pro- 
gram takes when run on input x. We wish to determine R = ~ ^)- 

To achieve this, we split the sum in two parts: R = R<n + R>n where R<n = 
Eto {R = k) and i?>v = EZn+i k-V {R = k). 

^ This result has been computed automatically by the Absinthe abstract interpreter 
http : / / cgi .dmi . ens .f r/ cgi-bin/monniaux/ absinthe. 
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We wish to obtain an upper bound on -R>at. 

1 

E + l)(/3 - 1) - /?) - r {a{f3 - 1) - /?)] (10) 

k—a 

and thus 

fc=a (^ - 1 ) 

Therefore R^n < a. ■ (/? ~ (-^ + 1)(/^ ~ !))• 

On the other hand, -R<at can be estimated experimentally [9]. Let us consider 
the random variable R*: R*{x) = 0 if R{x) > N and R*{x) = R{x)/N if 
R{x) < N; this is a random variable whose range is a subset of [0,1]. It is 
therefore possible to apply to R* the various techniques known for estimating 
the expectation of such random variables (such as Chernoff-Hoeffding bounds 
[4] or simple Gaussian confidence intervals, using the Central Limit Theorem). 
We then obtain R^n = N.R*. 

We therefore obtain a confidence upper bound on R by adding together the 
confidence intervals obtained on R<n and .R>Ar. 



7 Conclusions and Future Work 

We have proposed a method to analyze automatically the termination of certain 
kinds of computer programs. This analysis outputs probabilistic invariants of 
loops, especially on the distribution of the number of loop iterations done or the 
number of program cycles used. The invariants obtained can be used in several 
ways, including: 

— proofs of probabilistic termination, 

— proofs of soundness of experimental statistical methods. 

The analysis is proved to be sound, as it is set in a framework of abstract 
interpretation suitable for probabilistic programs. It has been partially imple- 
mented in an automatic analyzer for a subset of the C programming language. 
We plan to implement it fully in an analyzer for industrial embedded systems, 
where it could serve as a quick method to validate active waiting loops and 
similar constructs. 

This analysis deals with a particular kind of probabilistic distributions, for 
example typical of the running time of a program waiting for data with proba- 
bilistic timings on a network. Other kinds of distributions, such as the normal 
(Gaussian) one, are of particular interest when analyzing embedded systems — 
for instance, to model error propagation. We plan to give analyses suitable for 
such distributions. 




126 



David Monniaux 



References 

1. P. Cousot and R. Cousot. Abstract intrepretation: a unified lattice model for static 
analysis of programs by constrnction or approximation of fixpoints. In Conference 
Record of the fth ACM Symposium on Principles of Programming Languages, pages 
238-252, Los Angeles, CA, January 1977. 

2. Patrick Cousot and Radhia Cousot. Abstract interpretation and application to 
logic programs. J. Logic Prog., 2-3(13):103-179, 1992. 

3. J.L. Doob. Measure Theory, volume 143 of Craduate Texts in Mathematics. 
Springer- Verlag, 1994. 

4. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. 
J. Amer. Statist. Assoc., 58(301): 13-30, 1963. 

5. D. Kozen. Semantics of probabilistic programs. In 20th Annual Symposium on 
Foundations of Computer Science, pages 101-114, Long Beach, Ca., USA, October 
1979. IEEE Computer Society Press. 

6. D. Kozen. Semantics of probabilistic programs. .Journal of Computer and System 
Sciences, 22(3):328-350, 1981. 

7. Chin Soon Lee, Neil D. Jones, and Amir M. Ben-Amram. The size-change princi- 
ple for program termination. In ACM Symposium on Principles of Programming 
Languages, volume 28, pages 81-92. ACM press, January 2001. 

8. David Monniaux. Abstract interpretation of probabilistic semantics. In Seventh 
International Static Analysis Symposium (SAS’OO), number 1824 in Lecture Notes 
in Computer Science. Springer- Verlag, 2000. Extended version on the author’s web 
site. 

9. David Monniaux. An abstract Monte-Carlo method for the analysis of probabilistic 
programs (extended abstract). In 28th Symposium on Principles of Programming 
Languages (POPL ’01), pages 93-101. Association for Computer Machinery, 2001. 

10. Eric Goubault. Static analyses of floating-point operations. In Static Analysis 
(SAS’Ol), Lecture Notes in Computer Science, pages 234-259, Springer- Verlag, 
July 2001. 

11. Walter Rudin. Real and Complex Analysis. McGraw-Hill, 1966. 




Watchpoint Semantics: A Tool 
for Compositional and Focussed Static Analyses 



Fausto Spoto 

Dipartimento Scientifico e Tecnologico 
Strada Le Grazie, 15, Ca’ Vignal, 37134 Verona, Italy 
spotoSsci .univr . it 



Abstract. We abstract a denotational trace semantics for an impera- 
tive language into a compositional and focussed watchpoint semantics. 
Every abstraction of its computational domain induces an abstract, still 
compositional and focussed watchpoint semantics. We describe its im- 
plementation and instantiation with a domain of signs. It shows that its 
space and time costs are proportional to the number of watchpoints and 
that abstract compilation reduces those costs significantly. 



1 Introduction 

A compositional analysis of a complex statement is defined in terms of that of its 
components. Then the analysis of a procedure depends only on that of the pro- 
cedures it calls and the analysis of a huge program can be easily kept up-to-date. 
This is important if a local change is applied to the program, during debugging 
or as a consequence of program transformation. A focussed or demand-driven 
analysis is direeted to a given set of program points, called watehpoints, and has 
a cost (in space and time) proportional to their number. This is important if 
only those points are relevant. For instance, zero information is typically useful 
only before a division. Class information for object-oriented programs is typically 
useful only before a method call. During debugging the programmer wants to 
analyse a program in very few points, with a cost proportional to their number. 

Our first contribution here is the definition of a compositional and focussed 
watchpoint semantics, as an abstract interpretation (AI) jB] specified by the 
watchpoints of interest of a more concrete trace semantics. An optimality result 
(Equation (gj) states that no precision is lost by this abstraction w.r.t. the 
information at the watchpoints. The computational domain is identified as a data 
structure with some operations. The second contribution states that every AI of 
the computational domain induces an abstraction of the watchpoint semantics. 
This reduces the problem of static analysis to that of the development of abstract 
domains. The third contribution is the description of our implementation of the 
watchpoint semantics instantiated for sign analysis. It shows that the space and 
time costs of the analysis are proportional to the number of watchpoints. The 
final contribution is to show that abstract compilation HH leads to a significant 
improvement in the time and space costs of the analysis. 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 1 27- irPn 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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1.1 Related Works 



Traditionally, compositionality is synonym with denotational semantics. But the 
usual denotational semantics m provides an input/output characterisation of 
procedures which is too abstract to observe their internal behaviour. This has 
been recognised in jSj, where Cousot models information at internal program 
points through a more concrete, denotational trace semantics. Here, a procedure 
is denoted by a map from an initial state to a trace of states representing its 
execution from the initial state. Our trace semantics is a instance of the maxi- 
mal trace semantics of 0. Even ^ observes that traces contain more informa- 
tion than a traditional input/output denotation w.r.t. software pipelining, loop- 
invariant removal and data alias detection. That framework is however based 
on an operational definition. In an operational trace semantics is defined 
and abstracted through AI into an abstract trace semanties. The abstraction of 
a trace is a (regular) tree, because of the non-deterministic nature of the ab- 
stract semantics. Information at program points (what they call the collecting 
semantics) is extracted from those trees after the fixpoint of the semantics is 
reached. Thus, their analysis is not focussed, since the whole trace semantics 
(the abstract trees) must be computed and then projected on program points. 
This is justified by the fact that they are interested in properties of traces, like 
those considered in the context of model-checking. However, for properties of 
states, like in sign, interval, class and security context analysis of variables (and, 
more generally, in all those analyses called first-order in C21, page 210), trees 
are not needed and can be safely abstracted into sets of states. 

Abstract denotational semantics need a representation of the abstract in- 
put/output behaviour of a procedure. Since abstract inputs can partially over- 
lap, the meaning or, in abstract interpretation words, the concretisation of an 
abstract denotation is not so easily devisable. In Section 0 we show a solution 
when the abstract domain elements can be written in terms of union-irreducible 
elements. For the general case, we rely on the functional partitioning technique 
defined in p. 

Focussed or demand-driven frameworks for analysis have been developed in 
the past. In P backward propagation of assertions was applied to the debugging 
of a high-order imperative language. In MSI backward dataflow analysis from 
a given query is defined and shown more efficient than an exhaustive, unfocussed 
analysis. The analyses in jSpD] are provably as precise as the corresponding un- 
focussed versions for distributive finite dataflow problems, while our optimality 
result (Equation 0)) holds for every abstract domain. Queries can be checked 
to hold in a given program point, but cannot be computed by the analysis. It 
is not shown how those analyses scale w.r.t. the number of queries. No notion 
of computational domain is defined, which makes harder the definition of new 
abstract analyses. Abstract compilation cannot be applied because those anal- 
yses are not compositional. In |I] it is studied a very general and abstract way 
of looking at the problem of localised analyses in a given set of program points. 
However, its actual application to a real programming language is not tackled 
there. 
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Abstract compilation (AC) was born and applied only in the context of the 
analysis of logic programs m. It is an optimised computation of the fixpoint of 
an abstract semantics where at the z-th iteration part of the analysis is compiled 
when it becomes clear that it will not change at the (z + l)-th iteration. 

Our work has been heavily influenced by the systematic construction of se- 
mantics for logic programming from the observable property of interest mg. 
There, semantics for resultants, call patterns and computed answers of logic pro- 
grams are derived through AI of the very concrete semantics of SLD-derivations. 
In particular, the call pattern semantics collects the information in some program 
points only. 



1.2 Plan of the Paper 

Section 0 introduces some preliminary notations. Section 0 defines the simple 
imperative language used in the paper. Section E| defines the concrete trace se- 
mantics which is then abstracted into our watchpoint semantics of Section 0 
and its collecting version. Section 0 shows how every abstract interpretation of 
the concrete computational domain induces an abstract watchpoint semantics. 
Section 0 describes our implementation of the watchpoint semantics instantiated 
for sign analysis. Section 0concludes. Proofs are omitted. 

2 Preliminaries 

A sequence of elements from a set S is denoted by seq(S'). The cardinality of a 
set S is denoted by ^S. A definition like S = (a, 5), with a and b meta- variables, 
silently defines the selectors s.a and s.h for s G S. For instance. Definition 0 
defines t.w and t.s for t G . An element x G X will often stand for the 
singleton {a;} C X. 

The domain (codomain) of a function / is dom(/) (cd(/)). A total (partial) 
function is denoted by i-^- (^). We denote by [ui , . . . , zzn i— > t„] a function 

/ whose domain is {z;i, . . . , Vn} and such that f{vi) = U for z = 1, . . . , zz. Its 
update is f[di/wi, . . . ,dm/wm], where the domain can be potentially enlarged. 
By /Is (/l-s) we denote the restriction of / to s C dom(/) (to dom(/) \ s). 

A pair ((7, <) is a poset if < is reflexive, transitive and antisymmetric on C. 
A poset is a complete lattice when least upper bounds (lub) and greatest lower 
bounds (gib) always exist, a complete partial order (CPO) when lubs exist for 
the non-empty chains (totally ordered subsets). A CPO is pointed when it has a 
bottom element. A map is additive when it preserves all lubs. If f{x) = x then 
a; is a fixpoint of /. If a least fixpoint exists, it is denoted by lfp(/). 

Let (C, <) and (A, ^) be two posets (the concrete and the abstract domain). 
A Galois connection m is a pair of monotonic maps a : C A and 7 : A i— > C 
such that 7 a is extensive and 07 is reductive. It is a Galois insertion when 07 
is the identity map, i.e., when the abstract domain does not contain useless 
elements. This is equivalent to a being onto, or 7 one-to-one. The abstraetion a 
and the concretisation 7 determine each other. If C and A are complete lattices 
and a is additive, it is the abstraction map of a Galois connection. 
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An abstract operator / : A" — > A is correct w.r.t. / : C" — > C if a /7 ^ /. 
For each operator /, there exists an optimal (most precise) correct abstract 
operator / defined as / = afj. If /a = af, we say that / is a-optimal w.r.t. /, 
i.e., / computes the same abstract information as /. 

In AI, the semantics of a program is the fixpoint of some f \ C ^ C , where 
C is the computational domain p]. Its collecting version pj works over properties 
of C, i.e., over p{C) and is the fixpoint of the powerset extension of /. If / is 
defined through suboperations, their powerset extensions and U (which merges 
the semantics of the branches of a conditional) induce the extension of /. 

3 A Simple Language 

Our language is left expandable at the price of some redundancy in the defini- 
tions. Integers are its only basic type, with two operations (= and -I-). Booleans 
are implemented as integers {false is represented by a negative integer and true 
by any other integer). We do not have procedures but only functions. Those 
limitations are just meant to simplify the presentation. Our framework can be 
extended to cope with those missing items. 

Definition 1. Let Id be a set of identifiers and J- Q Id a finite set of function 
symbols. Expressions £ and commands C are defined by the grammar 

e::=i\v\ f{vi,. . . ,t>„) | e = e | e -|- e 

c ::= {v := e) | c; c | let v:t in c | if e then c else c | while e do c 

with Type = {int}, Int = Z, t G Type, i G Int, f G T and v,vi, . . . ,Vn G Id. 

A typing gives types to a finite set of variables. The map Pars binds every 
function to the typing (its signature) and the list of its parameters. This list 
provides the order of the variables in the definition of the function. The function 
Code binds a function symbol to its (syntactically correct and type-checked) 
code. Local variables are always introduced by a let construct. 

Definition 2. We define Typing = {t : Id ^ Type \ dom(r) is finite}, Code = 
{J- I— > C) and Pars = (IF 1 -^ seq{Id) x Typing). If p G Pars and f G T, then 
p{f) = {s,t), where f G s and dom(r) = s. The variable f holds the return value 
of the function, like in Pascal. 

A program is specified by T and two elements in Code and Pars. In the 
following, we assume that we have a program P = {P, c, p) . 

Expressions have a type in a typing. In our case, type,.(e) = int for e G £. 

Example 1. Figure EJa) gives a representation of the program for computing 
the n-th Fibonacci number (lines introduced by % are comments and will be 
discussed later). Note that the name fib of the (only) function in the program 
is used to hold its result value. Moreover, it is described by the Pars map p of 
the program. 
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Fig. 1. The signature and implementation of the operations over the states. 



4 Trace Semantics 

The computational domain of states described here is used below to define a trace 
semantics for our language. Each of its abstractions will induce an abstraction of 
that semantics (Section Ej), as usual in AI (see for instance na)- More complex 
notions of states could be used here, maybe dealing with locations and memory. 



Definition 3. Let Value = Int and S = UreTypingVr where, for r G Typing, 
states a G Sr map variables to values consistent with their declared type, i.e., 

a G dom(T) i— > Value and 

for every v G dom(r) if t{v) = int then a{v) G Int 
States are endowed with the operations shown in Figure Q 

In the operations of Figure ^ the variable res holds intermediate results. The 
nop operation does nothing. The getJnt (get_var”) operation loads an integer 
(the value of v) in res. The put_var^ operation copies the value of res in v. There 
is no result, then res is removed. For every binary operation like = and +, there 
is an operation on states. The operations scope and unscope are used before and 
after a call to a function /, respectively. The former creates a new state in which 
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/ can execute. Its typing p(/).T|_y describes the input parameters (the variable 
/ is not among them). The latter copies in the variable res of the state before 
the call, i.e., its first argument, the result of /, i.e., the variable / of its second 
argument. The operation expand (restrict) adds (removes) variables from a state. 
The is_true (isJalse) predicate checks whether res contains true (false). 

Since res plays a major role, we introduce the following abbreviations. 

Definition 4. For r G Typing, a G Sr and e G £, let r® = r[type^(e)/res] and 
LCTjr = restrict!)®® (cr) (t will be always omitted). 

We define now an instance of the maximal trace semantics of jS]. 

Definition 5. A trace t gT is a non-empty sequence in S . A convergent trace 
a\ ^ ^ cfn represents a terminated computation, a finite divergent trace 

ai —>■■■—> an a yet non-terminated computation and an infinite divergent 
trace ai ^ ^ an ^ ■ a divergent computation. Arrows are given labels 

I G Label, like in — meaning that the interpreter was then in a watchpoint 
labelled with I (see Section 0). We assume is given a hidden mark _ ^ Label. 

The first state oft gT is fst(t). The predicate div(t) means thatt is divergent. 
If ^div(t), the last state oft is Ist(t). For I G Label and a G S, we let a Gi t 
mean that a occurs in t before an arrow 

The Q ordering on traces (extension of finite divergent traces) is the minimal 
relation such that t\ C t 2 if t\ = t 2 or (t\ is finite divergent and t 2 = t\ — t' 
for some t' G T and I G Label U {f\), where t\ is t\ deprived of the tilde sign. 

Expressions and commands are denoted by a map from an initial state to a 
trace t. In the first case, if ^ div(t) then Ist(t) (res) is the value of the expression. 



Proposition 1. Given t , t ' G Typing, we expand the C ordering on traces to 
Cr,T> = {c G Ft 1-^ Tt.1 I for every a G we have fst(c((r)) = a} . (1) 

The pair {Cr,T',Q) is a pointed CFO whose bottom is = Xa G E^.a. 

Interpretations denote every f GIF with an element of Cp(/).t|_j:,p(/).t|j- In- 
deed, its input variables are p(f).s \ f and its output variable is named /. 

Example 2. The program of Figure m a.l is denoted by an interpretation which 
denotes fib with an element of C^,a^int],lfib^int]- 

Definition 6. The interpretations T are maps I : IF (E T) such that 
1(f) G C'p(y) for f G F. The C ordering is point-wise extended to X. 

Proposition 2. The semantic operations on denotations of Figure\^ (the sub- 
scripts will be usually omitted) are monotonic w.r.t. C. 

The operation [op] applies an operation op from Figure Q The operation ? joins 
the denotation E of an expression with that of one of two commands, depending 
on is_true and isJalse on the final states of E. Since commands do not receive 
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[op] : T-/ , with T, r G Typing^ op : 

?T : CT^T-[int/res\ ^ ^ I— > C-r,T, with T G Typing, res ^ dom(r) 

0^ ,-/^-^// : (Ct,t' ^ (^-r,r"t with e Typing 

®bop,T ■ /res] ^ ^r,r[t2/res] ' * ^r,r' 

with T, T ^Typing, res ^ dom(r), t-i,t 2 GType, 
and l > i^r[t 2 /res] * ^r'') 

> '“#p(/)-i)) • ^ C'r.T, with T G Typing, {ri , . . . , r^p(/)_i } C dom(T) 



[op\r,r'W) = 



?^(T.Si.S2)(<t) = 



(Si 0 ,. 



^// S2)(<t) = 



<7 — op{<j) if op{<j) is defined 
<7 — >7 otherwise 

E{a) if div(_E(cr)) 

Eia) Si(Ll5t(_E(CT))j) 

if ^ div(£;((T)) and is_trueT-[int/res] (lst(£;((T))) 
E{a) S2(Llst(_E(CT))j) 

ii—>d\v{E{a)) and is_falseT-fj^^/,peg] (lst(£J((r))) 

Si(fr) if div(5i(cr)) 

Si(fr) — *■ S2(lst(5i(fT))) otherwise. 



(Si 0 bop,T 52 )(cr) = < 



Si (cr) if div(Si (cr)) 

Si(fr) — *■ S2 (l/ij) if —I div(Si( 7 )) and div(S2 (l/i j)) 
Si((t) — *■ S 2 (l/ij) — *■ I2 

if ^ div(Si (cr)), ^ div(S 2 (L/i j)) 
and bopT-[t /res] (^ 1 ) (^ 2 ) is undefined 
Si((t) ^ S 2 (lZu) ^ bop-^[t^/res]{h){h) otherwise, 
where l± — lst(Si(fT)) and I 2 — lst(S 2 (L/i j)) 

i if div(i) 

i — »■ unscope::^((7)(lst(2)) otherwise 



Mr . . . ,Un))(/)(«T) = 

UM3<_U| 

where i = /(/) (scope;^’'"! ’ " (cr)). 



Fig. 2. The signature and the implementation of the semantic operations. 



a partial result in res, we restrict those states through i i. The operations 0 

and ( 8 >bop join two denotations Si and S' 2 . Divergent traces in Si are not joined, 
since they represent an incomplete computation. Moreover, ^bop applies a binary 

operation bop to the final states of Si and S '2 (1 1 removes res from the final states 

of Si). The operation ixi calls a function by using an interpretation. 

Example 3. Assume that r is such that Sp contains exactly three distinct states 
(Ti, (J 2 and < 73 . Consider Si,S 2 G Ct,t such that 

Sl(cTi) = (Ti ^ (72 (73 S2((7 i) = Oi T2 

Si (( 72 ) = (72 (7l S2((72) = (72 (73 (7l 

Si (( 73 ) = (73 (7l ^'3 (jg S2((73) = (73 dl . 

Let S = Si 0 S 2 . We have 

S((7l) = (7l ^ (72 (73 ^ (73 dl S((72) = (72 dl 

S(( 73 ) = (73 (7l (73 ^ (73 CTl . 
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£rlill = [getJnt;] 
^rlvjl = [get.var”] 



fr[ei = 62]/ = fr[ei|7 £’462]/ 
£r\ei + 62]/ = fr[ei]/ ®+ £^r[62]/ 



fr[/(l^l, . . ■,Vn)jI = IXIt if{vi, ■ ■ .,Vn)){I) ■ 



Ct\v := 6]/ = £t| 6]/ ® [put_var”e] Ct | ei; 62]/ = Ct\ci\I ®Ct\c2\I 

Ci-Ilet v.t in c]/ = [expand”'*] (g) C^[t/„] |e]/ ig) [restrict”[t/„]j 
Crlif 6 then ci else 62]/ = ?(£t [ e]/, C t[ci]/, Ct-|62]/) 

C-rlwhile 6 do c]/ = IfPc^ ^ A/ia;.?(fT [e]/, Ct[c]/ (g) fix, [nop^]) . 

Fig. 3. The rules of our denotational trace semantics. 

By using the above operations, we build a denotational semantics for our 
language. The map Sr\e\ : / 1 — > Cr,r^ is shown in Figure 0 (for r®, see Definition 
0J. The basic cases of the denotation of an expression are immediate. For cases 
like Cl bop 62 , the denotations of the two expressions are joined through ®bop- 
For function call, we use ixi. The map Cr|] : C x X 1 -^ Cr,r is shown in Figure El 
The denotation of an assignment applies put_var to the final environments of the 
denotation of the right hand side. The introduction of a local variable v evaluates 
the code in a state expanded with v. Conditionals are modelled through the ? 
operation. A while command is denoted by a least fixpoint over a conditional 
m- It is well-defined since both C|] and £|] are monotonic (Propositional), and 
because of Proposition E It is the least upper bound of an ascending transfinite 
chain which starts from ^ ■ 

The semantics of a program is a least fixpoint defined through CD HE|. 
Namely, for every / G iF we initialise (expand) the variable /, we compute 
the denotation of its code and we remove all the variables except /. 

Definition 7. By Props. Handle the semantics of P = {T,c,p) is defined as 
Ip where, letting r = p{f)-T, f G if, i finite ordinal and I limit ordinal, 



5 Watchpoint Semantics 

We specify a program point of interest (a watchpoint) through the command 
watchpoint(Z), with I G Label. We expand the rules in FigureElwith 



For r G Typing and I G Label, [watch]^^ G Cr,r creates a transition, i.e.. 




CT|watchpoint(Z)]/ = [watch]Ty . 



[watch\ry{a) = a — cr . 



(2) 



Note that the typing ti in a watchpoint I is statically known. 
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Assume we are not interested in the states before an unnamed transition, but 
only in those before with I G Label, which can be selected through a map 

w \ T ^ {Label p(^)) 

(for its explicit definition, see Definition pointwise extended to denotations 
(Eq. (0) and interpretations (Def. 0. Instead of computing w{Sp) (Def. ITj), 
we want to push w inside the semantics, i.e., compute the abstract watchpoint 
semantics induced by the abstraction w. 

5.1 Why a New Semantics 

Given t S T, by definition w{t) is more abstract than t, and requires less space 
(memory) to be stored. Let <;{x) be the space needed to store x. Since we are 
particularly interested in the case when the program to be analysed is huge and 
the number of watchpoints is relatively small, we can assume that 

c(w(t)) < <l(t) (3) 

for t gT . Assume we want to compute w{t) where t is a trace for the command 
ci; C 2 , i.e., the concatenation of a trace t\ generated by ci and a trace ^2 generated 
by C 2 from Ist(ti). We can compute t\, abstract it in w{ti), compute t 2 , abstract 
it in w{t 2 ) and merge w{t\) and w{t 2 ) into w{t). For this we need at most 
mi = max{c(ti)+^(w(fi)), c(w(ti))+^(w(f 2 )) + <l(t 2 )} space (we never hold both 
ti and t 2 in memory at the same time). Instead, if we compute t and then w{t), 
we need at least m 2 = g{t) + g{w{t)) = g{ti) + <^(^ 2 ) + ^{w{ti)) + <;{w{t 2 )) space. 
Since mi < m 2 , pushing w inside the semantics induces a lighter calculation. 

This claim does not work for the time of the analysis, since a state depends 
on its predecessors in a trace. Hence all states must be considered during the 
computation of the semantics, not just those before a — transition. But the 
watchpoint semantics reduces the cost in time of the analysis for other reasons. 

1. A more abstract fixpoint computation might require fewer iterations, and 
hence less space and time. Section 0 shows that this is very often the case. 

2. Consider while e do c, where c contains some watchpoints, c is denoted by d 
and ehy d' . If we unfold d after d' until the fixpoint (Fig. 0), we then need to 
scan a trace looking for the transitions. If, instead, we had a denotation 
w{d) such that w{d){a) = w{d{a)), we could just merge, during the fixpoint 
calculation, the states for the same watchpoints, without scanning any trace. 

3. Dealing with smaller data structures (as shown before) leads in general to 
faster analyses. From Equation o, this could mean sometimes that virtual 
memory is not needed by the analyser, i.e., swapping is avoided. 

4. Analyses based on a trace semantics use widening to avoid dealing with infi- 
nite traces. For instance, 0 and uni use regular trees, which add complexity 
to the analyser. A watchpoint semantics does not need such a widening. 

Consider how the analysis scales with the number of watchpoints. Of course, 
fewer watchpoints means lighter data structures, i.e., less space requirements. 



136 Fausto Spoto 



W.r.t. time, fewer watchpoints means faster analysis for points 1 and 3 above. 
Finally since, for every watchpoint I, we need to compute the union (join) of the 
states before a — transition, it even means fewer joins, i.e., a faster analysis. 
These considerations have been experimentally verified in Section Q 

5.2 The Semantics 

We define here in detail the watchpoint semantics. To observe the states in the 
watchpoints, we can abstract the traces in sets of states, one for each watchpoint. 
But this abstraction induces too coarse optimal abstract operations, since the 0 
operation (Figure EJ joins the traces through their last state. Thus, for better 
precision, we abstract the traces in watchpoint traces, i.e., a set of states for every 
watchpoint, collected into an element of H”", and a set for the final states. 

Definition 8. Let 

VV”" = {w S Label p{S) \ given I G Label we have w{l) G p(Ilr,)} ■ 

A w G W”" is finite if w{l) is finite for every I G Label. The set W’" is a complete 
lattice ordered w.r.t. the pointwise extension of C. Lub and gib are (pointwise) 
U and n, its bottom is L = XI G Label.ib. 

The set of watchpoint traces is T*" = UreTypingTf" where, for t G Typing, 

= {(w, s) I u> G yy“', s S ifT U {~} and if s ^ ^ then w is finite} . 

They are ordered as {w, s) C’" {w, s) and {w, ~) (u>', s) if and only ifw C w' . 

Elements of W’" are extentionally represented as i7i, . . . , ^„ Sn], mean- 

ing that the label k is mapped to the set of states Hi for i = 1, . . . , n. If a label 
is not contained in that enumeration, it is assumed that it is mapped to 0. 

Example 4- We have 

([^1 W3},k {cT2}],^) E*" {[h {<Jl,CT3},l2 W2},]k {o-2,Cr3}],(Ji) . 

A watchpoint trace {w, s) with s yf ~ represents all convergent traces which 
end with s and contain exactly the watchpoints in w. If s = ~, instead, it 
represents all divergent traces which contain exactly the watchpoints in w. 

Definition 9. Given t G T, we define w{t) G W”" and o’" : T T’" as 

w{t){l) = {a \ a Gi t} for every I G Label. 

^u,(^) ^ f(''"(i)>~) (fdiv(t) 

I {w{f), Ist(t)) otherwise. 

Example 5. Let Label = {^i,/ 2 }- Then 

a’"((Tl ^ (72 (T3 ^ (74 (75 (7e) = {[h {(72,(75},/2 {0-4}], CTe) 

Of ’"((71 ^ (72 (73 ^ (74 (75 CTe) = ([^1 1-^ {f72,(75},;2 {0-4}], ~) • 
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r n/; / ^ I (-L,Op(f 7 )) if Op(<7) is defined , , n.; / ^ / . rc ^ 

= { 1±: 4 otheLise. l^atch]Tia) = (±[{a}//]. .) 

E(cr) if E{cr),s — ~ 

{E{cr).w U Si{\-E{cr).s_i).w, Si{\-E{(t) .sj) .s) 

5 i, S2){cr) — ' if E{cr).s ^ ~ and is_trueT-[j^(./reg] (£J(( t).s) 

{E{(t).w U S2{\-E{cr).s_i).w, ^ 2 (l-E(ct) . s j) .s) 
if E{cr).s ^ ~ and is_falseT-[j^4/r-es] (-E'C'^)-^) 



(Si 0 "^ S2){<y) — 



Si(cr) if Si((t).s — ' 

{Si(cr),w U S2(Si(o-).s).u;, S2 (Si (cr) .s).s) otherwise. 



(Si ®ropS2)(a) = 



Si(cr) if Si(cr).s — ~ 

(Si(fT).u; U S 2 (lSi(o-).sj).ii;, ~) 

if Si((t).s 7^ ~ and (S2 (lSi((t).sj).s = ~ or fe is undefined) 
{Si{(t).w U S2(LSi((T).sj).ii;, b) otherwise, 
with b — bop^[t^/res]{Si{(r).s){S2{^Si{(T).sj).s) 

ix” (/(I'l, ■ • • ,i’p))(t)(o-) = I 4^’ ^ fr \r w * 4 * ■ i = /(/)(scope-''’’’i’'"’’'"(cr)) 

I ( 2 . It;, unscope-' (( t) ( z.s)) otherwise, 



Fig. 4 . The operations on watchpoint traces. 



We define now the abstract counterpart of the set Cr,r' of Equation m- 

Proposition 3. Let t,t' € Typing and Wr,T' = Sr ^ . The C’" order (a'") 

is pointwise extended to Wr,r> (Ct,t')- is a pointed CPO 

with bottom Act G A;t-.(_L, ^), and is well-defined, onto, strict and additive. 

Proposition 4. The operations in Fig^ whose signatures are the a'^ abstraction 
of those in Fig. O are monotonic and a'" -optimal w.r.t. those in Fig. 



Example 6. Consider the concrete denotations of Example 0 Let 5™ = o'" (S'!), 
Slf = d^{S 2 ) and S'" = a'"{S). We have 

5r(a2) = ([Z2^{a2}],^) Slf{a2) 

5r(^3) = (['2 - {^3>,'3 - {^l}],a3) 52^(a3) 

Moreover, we have that S'" is 

= (['i {c^2,CT3}],~) 

S'"{<J'i) = (['l {ct3},^2 ^ {ct 3},^3 I— > {(Ti}],~ 

which is exatcly S'f ®'" Sf . 

Like in Section0 we define a watchpoint semantics Sp. By Propositions0and 
0 it computes the same information about watchpoints as our trace semantics, 
i.e., 

a'"(Sp) = S^ . 



= (E-) 

= (['2 I— > W3},h ^ {ct 2 }],"^) 

= ([' 1 - W],~). 



^“(a2)=([Z2^{a2}],~) 

), 



(4) 



138 Fausto Spoto 



[op]°°(’ 7 ) = <-L,op(?7)) [watc^^ir]) = {±[ri/l],rj) 

Si, S2)(»j) = {E(ri).w U Si{^ritj).w U S2(l?7/ j).id, Si(LT)tj).r) U S 2 {i-rif -‘)-V) 

where r)t = is.true^[i„t/res] (■B(t 7).?7) and ijf = \sJ 3 \se^^^„t/r■eB](E(v)■v) 

(Si S2)(»)) = {Si(t]).w U S2(Si(?7).’i)).in, S2(Si(7)).77).7)> 

(•Si ®b°p S 2 )(»j) = {Sl{ri).W U S2(LSl(?)).?7j).lU,6opp[tj/pes](Sl(77).7))(S2(LSl(7)).?7j).77)> 
(f(vi , ■ ■ ■ 7 '^n))(^)(v) — (i-m, unscope'^ iv)i'^-v)) 7 where i — I (f){scope^’^^ iv))- 

Fig. 5. The operations on collecting watchpoint traces. 

The collecting or static semantics is the powerset lifting of Sp. It 

works over p(Wr,r')^ it models properties of watchpoint denotations. Since 
we are interested in properties of states, we define below a semantics 5™ which 
works over watchpoint traces of sets of states. It is an AI of and will be 

called collecting though, strictly speaking, the real collecting semantics is Sp'^K 

Definition 10. The set of collecting watchpoint traces T“° = Ur^TypingTf° 
where, letting r G Typing and yv^° = H”" (Definition^ Tf° = {{w,rj) I w G 
and rj G is ordered as (wi, ?7i) (^ 2 , 772 ) iffwiQw 2 and 771 C 772 . 

Denotations are identified by their values on singleton sets. Denotations with 
more than one argument will be useful at the end of this section. This is for- 
malised below. 

Proposition 5. Let n> 1 and ri, . . . ,Tn,r' G Typing. Let be 

COG p(i:pi) co(i7i)...(77„) = (UpiEpi co({cri}) ■ ■ ■ ({cr„}).ic, 

• • P(^x„) co({o-i}) •• • ({<r„}).r;) 

The order of Definition ^ is pointwise extended to denotations CO. The 
pair {COri,...,T„,T',C'^°) is a complete lattice with bottom A 771 • • • A? 7 „.(T, 0). 

A collecting watchpoint trace represents a set of watchpoint traces. This 
abstraction induces optimal abstract counterparts of the operations in Figure 0 

Proposition 6 . Let a’^° : p{T^) 1 -^ T“° be a’^°{S) = (Utgst.w, {t.s \ t G 
S and t.s yf ~})- Its extension : p{Wr,Ti) COr.r', for r, r' G Typing, 
given by {a^°{W)){rj) = a‘^°{{w{a) \ w G W and a G 77 }) for 77 G p{Sr), is well- 
defined, onto, strict and additive (hence, the abstraction of a Galois insertion). 

Proposition 7. The operations in Figure Q are monotonic and a.'^°-optimal 
w.r.t. the pointwise extension of those in Figure^ 

We define a collecting watchpoint semantics 5p°. We have 5“ = a'^°{Sp'"'^). 
The operations in Figure Ouse objects in p(A) (like Si{r]).p in 0 '^°), W“° 
(like Si{r]).w in (g)“) and CO (like Si in 0 “). To simplify the abstraction of 
Section El we compile them in terms of smaller operations over CO only, given 
in Figure El The compilation is shown in Figure 0 

Proposition 8 . The operations in Figure\^ and those in Figure^are the same. 
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[op] : if op: 

.w : ^ 

° ■■ X CO^.^i • ■ ■ X CO,.^„ CO^_^, 



[watch]i : COt,t 



U : CO;,,, CO,,,, 



[op] = Api . . . Ap„.(_L, opipi, • • • , p„)) 
T.-io — Ap.(T(p).u>, 0) 



Ti U T 2 — Ap. 



Ti{p).w U T 2 (p).io, \ 
Ti(p).p U T2(p).p / 



[loatcfiji = Ap.(_L[p/i], p) 
To(Ti,...,T„) = Ap.T(Ti(p).p)--.(T„(p).p) 



Fig. 6. A minimal set of operations over CO. 



lTj — [restrict^^^j o T , [op]'^^ — [op] , ['watch]^° — [watchli 

?“(£;, Si, S 2 ) = E.w U (Si o lTu) U (S 2 o lT/j) 

where Tt = ]is_true,[i„i/,es]] ° C and Tf = [is_false,[i„t/,es]] ° E 
Si 0 *^° S 2 — S\.w U (S 2 o Si) 

•Si ®bop -^2 = Si. 10 U (S 2 O lSij).u, U [bop,[tj/,es]] o (Si, S 2 o lSu) 

(/(oi, . . . ,Vn)){I) — Ti.w U [unscope'^] o ([nop] , Ti) where Tj — /(/) o [scope'^’^^ ’ 

Fig. 7. The operations in Figure 0 in terms of those in Figured 



6 Prom Abstract Domains to Abstract Semantics 

We show here how every abstraction of the domain of states (Definition 0) in- 
duces an abstraction of the denotations CO (Equation (0) and of their opera- 
tions (Figure 0 } and hence of the collecting watchpoint semantics of last section. 
This reduces the definition of a static analysis to the definition of abstract states. 

Every abstract denotational semantics works over abstract denotations which 
are maps from abstract inputs to abstract outputs. In our watchpoint semantics, 
the abstract outputs are actually abstract traces. The problem here is how to 
define the concretisation of such abstract denotations. If a concrete state belongs 
to two abstract inputs, how should it behave in the concretisation? We do not 
consider this problem in details here, since it has already been studied in a more 
general setting. Consider for instance the functional partitioning technique in 
0. Instead, we assume here that in the lattice of abstract states there exists a 
set of union-irreducible states in terms of which all other abstract states can be 
expressed. This condition holds for the case of sign analysis shown in Section | 7 | 
For T S Typing, let (Di-,C) be a complete lattice and and 7 -°^ the 
abstraction and concretisation maps of a Galois insertion from (p(A'i-),C) to 
{Dt , C) (typings will be often omitted) . 

Definition 11. Let W“ = {w & Labels D\ for I G Label we have w{l) G 0^}- 
The set of abstract watchpoint traces is T“ = UreTypingTf where 

Tf = {{w, d) \ w G W“ and d G Dr} for every r G Typing, 

ordered as {wi,d\) (^ 2 ,^ 2 ) if and only ifw\ W2 (pointwise) and di c? 2 - 

The map is expanded to Tf° as a^{{w,rj)) = {Xl.a^n(w{l)),a^^ {rff) . 
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[op]“ — Xdi . . . Xdn.{-L-, {op{'j^ (di) , . . . , (d-n)))) [watch]^ — Arf.(_L[d//], d) 

T.uj“ ^ Xd.{T{d).w,^) 

To“ (Ti, . . . ,T„) - Xd. T(Ti(rf')) - (Tn(ti')) , with S G p{d) 

Ti T 2 — Xd. / \ best approximation of U over D. 

\ Ti(a}.aU T2\d).d / 

Fig. 8. The generic abstract counterparts of the operations of Figure E 



Definition 12. Let r G Typing. The union-reductions of d G Dr are 

p{d) = {S G p{Dr) I J^id) = Ud'^sl^id') and #p{d') = 1 for every d' G S} . 

If ffp{d) = 1 (i.e., p{d) = {{d}}j we say that d is union-irreducible. If every 
d G Dr is sueh that p{d) ^ 0, we say that Dr is union-reducible. 



Proposition 9. Assume that Dr is union-reducible for every t G Typing. Given 
n>l and Ti, ... ,Tn,T' G Typing, let Ar,,^...^r„,T' be 

a is monotonic and given 1 < i < n and S G p{di) 'j 
a{d^) ■ ■ ■ (di) ■ ■ ■ (d„) = Adi) ■ ■ ■ id') ■ ■ ■ (d„) / ’ (6) 

i.e., denotations in ore identified by the union-irreducible elements. 

Let q;“ : COri,...,Tn,T' '— *■ be o“(co) = a^ccr/^. The set Ar.,^^...^r„,T' 

is a complete lattice with bottom Adi . . . Ad„.(_L, ond is well-defined, 

onto, strict and additive (hence, the abstraction map of a Galois insertion). 



G Dr 



■Dr„ 



Dr 



Proposition 10. Assume that Dr is union-reducible for every t G Typing. The 
operations in Figure\^are the best approximations over A of those in Figure fTI 
(note that o“ is not the composition of functions). 

In conclusion, given a union-reducible abstraction Dr of p{Dr) for every 
T G Typing, and the best approximations over D of the powerset extension of the 
operations of FigureQ] (used in [op]“) and of U, we obtain an abstract watchpoint 
semantics, correct w.r.t. the collecting watchpoint semantics of Subsection 15.21 
As said before, similar results can be obtained in the more general case of non- 
union-reducible lattices by using the functional partitioning technique of Q- 

7 Implementation 

We describe here our implementation in Prolog of the watchpoint semantics of 
Sections O and E instantiated with sign analysis. It can be downloaded from 
http ; //www. sci . univr . it/~spoto/watch. tar . gz We have chosen Prolog for 
fast prototyping, and sign analysis because it is a well-known, simple analysis. 

The module analyser.pl implements the fixpoint calculation (Figure Eland 
Definition E by using the semantic operations (Figures E and E) implemented 
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nop® (^) = <; 



(get.int^)®(^) = 



^[+/res] if z > 0 
<;[—/res] if i < 0 



=J(^i)(<;2) = 



(get_var”)®(?) = t;[<;(v)/res] (put_var!J!)®(<;) = ^ [?(r-es)/'u] | _ 

■J2[— /■res] if i;i(res) = — 

and a (res) = + 
or vice versa 
C2[n/res] otherwise 

(scope:J^’"i ”")®(c) = [ti c(ni), . . . ,t„ e-f c(n„)] where (ti, . . . , t„) = p(f).s \ f 

(unscope:J^)®(ci)(c 2 ) = Cl [C2 (/)/res] (restrict”® )® (c) = c|_^s (expand”'‘)®(c) = c[+/n] 



{ C2[+/i'es] if ci(res) = C2(res) = + 
^2[-/res] if ci(res) = C2(res) = - 
^2['w/r‘es] otherwise 



is_truej(<;) = 



empty if <;{res) — — 
<;[-\-lres\ otherwise 



empty if <;{res) — + 
<;■[— /res] otherwise 



iJalse® (<?) ^ 

uj(empty)(ai) — uj(ai)(empty) — x uj (‘?i)(‘?2) — Ar G dom(r) 

Fig. 9. The abstract operations over the domain of signs. 



Ci(n) if Ci(ir) = C2(n) 
u otherwise. 



in the module semantic.pl. The module typing.pl manipulates typings. The 
module domain.pl implements the abstract counterparts of the operations of 
Figure 0 Only this module depends from the domain of analysis. 

Our domain for sign analysis is similar to that in jOj. 

Definition 13. For every t G Typing, let Sr = {empty} U (c : dom(r) i— > 
{+,—,«}}. The abstraction map a : p{Sr) Sr is such that, for X rfz % and 
V G dom(r), 



{ + if a{v) > 0 for every a G X 
— if a(v) < 0 for every a G X 
u otherwise. 



Let < be reflexive and let + < u and — < u. The set Sr is ordered as empty C® s 
for every s G Sr and ci C® C 2 if and only if ^i{v) < <? 2 (t) for every v G dom(T). 
The optimal counterparts over S of the powerset extension of the operations in 
Fig. Elate (all but U®} strict on empty. Otherwise, they are given in Figure\^ 

Given r G Typing, the union-irreducible elements of Sr are empty and those 
G Sr such that ?(u) yf u for every v G dom(r). If c(u) = u for some v G 
dom(r), instead, the concretisation of c can be shown to be the union of the 
concretisations of c[+/u] and /u]. Therefore, we have 



p(empty) = {empty} 




for all v G dom(r) we have ‘^'(v) yf u 
and if c(u) y^ u then c(u) = c'(u) 



By the results of Section 0 , the abstract denotations are maps whose domain is 
made of empty and of all c which never bind a variable to u. The values for the 
other elements of Sr are induced. 

Elements of Sr are implemented as the term empty or lists of +, - and u, 
ordered alphabetically w.r.t. the names in dom(r). For instance, if r = [a i— *■ 
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T = {fib} 

p(f ib) = ( (f ib, n) , [f ib I— >■ int, n int]) 

c(fib) = if (n =< 1) then 

%wat chpo int (pi); 
fib 1 
else 

%watchpoint(p2); 
let nl : int in let n2 : int in 
%watchpoint (p3) ; 
nl n — 1; 

%watchpoint (p4) ; 
n2 n - 2; 

%watchpoint (p5) ; 
fib fib(nl) + fib(n2); 
%wat chpo int (p6 ) 



I ? — interpret. 

Analysing [fib] : iteration 1 
Analysing [fib] : iteration 2 
fixpoint reached 

Procedure : fib 

Input : empty Output : empty 
Watchpoints : p3 : empty 

Input : [+] Output : [+] 
Watchpoints : p3 ; [+, +, +, +] 

Input : [— ] Output : [+] 
Watchpoints : p3 : empty 



(a) 



(b) 



Fig. 10. The Fibonacci procedure and one of its possible analyses. 



int,c I— > int,b i— > int] then c = [a +, c , 6 it] is implemented as 
[+,u,-] . We are aware of cleverer implementations, but in this paper we focus 
on the semantics. 

The input of the analyser is a Prolog term which represents the abstract syn- 
tax of a program. Figure fTil f a, 1 shows a program for the n-th Fibonacci number, 
with six possible watchpoints. The file fib.pl contains its abstract syntax. We 
download it with [fib] . and we analyse it with interpret . Figure rTT¥ hl shows 
the result when only watchpoint pa is not commented. The input of fib is the 
value of n, its output is the value of the variable fib at its end. As you can see, 
if we start with an empty set of states we never reach watchpoint p^. If we start 
with a state where n is positive, the output is positive and we reach watchpoint 
P 3 with a state where fib, n, nl and n2 are positive. Indeed the initial value 
of a variable is 0 and in the else branch we have n > 1. Finally, if we start 
with a state where n is negative, the output is positive and watchpoint pa is 
never reached. Indeed, if n < 0 the then branch is executed. If we start with 
an unknown value for n we would obtain the least upper bound of the last two 
cases. 



7.1 The Costs in Space and in Time of the Analysis 

To estimate the space used by our analyser independently from its implemen- 
tation, we count the number of Prolog atoms contained in the denotations it 
computes {weight). Fig. ^Ogives the weight for the analysis of fib (Fig. E3(a)) 
and pi (a Monte Carlo algorithm computing tt), as a function of the number of 
active watchpoints. For now, consider only the lines marked with Abstract In- 
terpretation. Horizontally, an integer like 3 means that only watchpoints pi , p 2 
and Pa were active. As you can see, the weight grows with the number of active 
watchpoints. When passing from 0 watchpoints to 1 watchpoint in Fig. CHa) 
and from 10 watchpoints to 11 in Fig. m one more iteration is needed to 
reach the fixpoint. Thus less watchpoints does mean less iterations ISubs. IFTTIl . 
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(a) fib (b) pi 

Fig. 11. The cost in space of the analysis w.r.t. the number of active watchpoints. 



We expect the time of the analysis to grow with the number of watchpoints, 
proportionally with the cost of the abstract join (Subs. 15.1 II . Fig. ffSl confirms 
this. The constant c is a fictitious cost added to the computation of the join. 
Note that domains more realistic than signs usually feature complex joins. Note 
again the jump when one more iteration is needed for the fixpoint calculation. 
The benchmark nested shows the same behaviour of Fig. El But if a benchmark 
does not contain recursive predicates nor conditionals nor iterative constructs, 
then the time for its analysis is independent from the number of watchpoints, 
like for arith, whose abstract execution tree is actually a finite trace. 

Note that ^ and [El do not provide a link to an implementation. 



7.2 Abstract Compilation 

In Figure E[a) we note that the denotation of the then branch is independent 
from the partial denotation computed for fib. Thus, it does not need to be com- 
puted at every iteration, like, instead, that of the else branch, which contains 
two calls to fib. However, its first part, till the watchpoint ps, does not contain 
recursive calls, and can be safely analysed only once. Those optimisations are 
examples of abstract compilation (AC). Our analyser uses AC by invoking the 
goal compile. The result is like that in Figure E[b), with smaller space and time 
costs, as Figures ITTI a.nd me) show for weight (space) and time, respectively. 
Moreover, Figure dc) shows that the time still depends from the number of 
watchpoints and the cost of the join. Finally, Figure El shows that AC leads very 
often to major improvements, but is of no help with the flat benchmark arith. 

8 Conclusions 

We have shown that, if we are interested in the analysis of a program in a 
small set of watchpoints, it is worth abstracting a trace semantics in a lighter, 
compositional and still as precise watchpoint semantics. We have shown through 
an implementation that it is focussed, i.e., its complexity grows with the number 
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time 



time 





(a) fib 



(b) pi 



(c) fib with AC 



Fig. 12. The cost in time of the analysis w.r.t. the number of active watchpoints. 



Benchmark 


Watchpoints 


Al/AC 


Iterations 


'i'ime (seconds) 


Weight (atoms) 


fib 


B 


AI 


2 


5.92 


160795 


fib 


B 


AC 


2 


4.03 


142299 


fib 


{pi. ■ ■ ■ .Pel 


AI 


3 


9.19 


349631 


fib 


{pi. ■ • ■ .Pel 


AC 


3 


5.28 


305363 


pi 




AI 


2 


16.61 


833419 


pi 


0 


AC 


2 


8.91 


463294 


pi 


{pi. ■ ■ ■ .Pis} 


AI 


3 


25.93 


2107643 


pi 


{pi. ■ ■ ■ .Pis} 


AC 


3 


10.58 


1287127 


arith 




AI 


1 


303.12 


7049327 


arith 


0 


AC 


1 


308.42 


7049327 


nested 


0 


AI 


3 


661.43 


14253268 


nested 


0 


AC 


3 


369.99 


8419626 



Fig. 13. A comparison of abstract interpretation with abstract compilation. 



of watchpoints, and that abstract compilation improves significantly the fixpoint 
calculation. 

The analysis process is defined as a fixpoint computation. For better effi- 
ciency, if a set of call patterns is known for some functions, this computation 
can be done on demand, simulating a top-down analysis. This means that the 
abstract denotations are enriched at fixpoint computation time whenever the 
behaviour of a function for a new input is needed. 

Our results apply to the modular analysis of large programs and to the 
analysis inside smart cards, where memory requirements must be kept small. 
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Abstract. We present a parametric groundness analysis whose input 
and output are parameterized by a set of groundness parameters. The 
result of the analysis can be instantiated for different uses of the pro- 
gram. It can also be used to derive sufficient conditions for safely re- 
moving groundness checks for built-in calls in the program. The para- 
metric groundness analysis is obtained by generalizing a non-parametric 
groundness analysis that uses the abstract domain Con. It is shown to 
be as precise as the non-parametric groundness analysis for any possible 
values for the groundness parameters. Experimental results of a proto- 
type implementation of the parametric groundness analysis are given. 
Keywords: Abstract Interpretation, Groundness Analysis, Logic 
programs. 



1 Introduction 



In logic programming nm, a real world problem is modeled as a set of axioms 
and a general execution mechanism is used to solve the problem. While this 
allows a problem to be solved in a natural and declarative manner, the gen- 
eral execution mechanism incurs a performance penalty for most programs. This 
motivated much research into semantic based analysis of logic programs m- 
Groundness analysis is one of the most important analyses for logic programs. 
It provides answers to questions such as whether, at a program point, a vari- 
able is definitely bound to a ground term - a term that contains no variables. 
This is useful not only to an optimizing compiler but also to other program 
manipulation tools. There have been many methods proposed for groundness 
analysis [36f43f2‘2|6f 1 4f46| I 7|34p I 7p2|3f29f‘25^9|8f4^ . 

This paper presents a new groundness analysis whose input and output are 
parameterized by a number of groundness parameters, hence called parametric 
groundness analysis. These parameters represent groundness information that is 
not available before analysis but can be provided after analysis. Providing such 
information instantiates the result of analysis. Instantiability implies reusability. 
A program module such as a library program can be analyzed once and the 
result be instantiated for different uses of the program module. This improves 
the efficiency of analysis. Instantiability also makes the new groundness analysis 
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amenable to program modifications since modules that are not changed need not 
be re-analyzed. Groundness parameters in the input and the output of the new 
groundness analysis makes it easier to derive a sufficient condition under which 
groundness checks for built-in calls in the program can be safely removed. 

The parametric groundness analysis is obtained by generalizing a ground- 
ness analysis based on the abstract domain Con m- Con is the least precise 
abstract domain for groundness analysis. The parametric groundness analysis is 
thus less precise than a groundness analysis that uses a more precise abstract 
domain namely Pos |^, Def |2| or EPos (21j. However, a Con-based groundness 
analysis is much more efficient than groundness analyzers based on more precise 
abstract domains. By generalizing a Con-based groundness analysis, we obtain 
a parametric groundness analysis that is more efficient and scalable. 

The parametric groundness analysis is performed by abstract interpreta- 
tion II 811 HI . Abstract interpretation is a methodology for static program analysis 
whereby a program analysis is viewed as the execution of the program over a 
non-standard data domain. A number of frameworks have been brought about 
for abstract interpretation of logic programs An abstract 

interpretation framework is an analysis engine that takes care of algorithmic 
issues that are common to a class of analyses, allowing the designer of an anal- 
ysis to focus on issues that are specific to the analysis. This greatly simplifies 
the design and the presentation of a new analysis. The parametric groundness 
analysis will be presented in the abstract interpretation framework in j.'i I j . The 
adaptation to other frameworks can be easily made. 

The remainder of the paper is organized as follows. Section Elgives motivation 
for the parametric groundness analysis through an example. Section 01 gives ba- 
sic notations and briefly describes the abstract interpretation framework in OH. 
Section 0 reformulates a non-parametric groundness analysis that is generalized 
in section 01 to obtain the new groundness analysis. Section 01 provides perfor- 
mance results of a prototype implementation. In sectionQ we compare our work 
with related work. Section 0 concludes the paper. Proofs are omitted due to 
space limit. 

2 Motivation 

In groundness analysis, we are interested in knowing which variables will be 
definitely instantiated to ground terms and which variables are not when the 

execution of the program reaches a program point. We use g and u to represent 

dsf 

these two groundness modes of a variable. Let MO = {g, u} and < be defined as 
g<g, g<u and u<u. (MO,<) is a complete lattice with infimum g and supremum u. 
Let V and A be the least upper bound and the greatest lower bound operators 
on (MO,<) respectively. 

Example 1. Consider the program and the initial goal in Figured Let © : Q 
® : R denote that if Q holds at the program point @ then R holds whenever the 
execution reaches the program point ®. Let A i— ^ m denote that the groundness 
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<— 0 treesort{Li,Lo). 0 

treesort{Li, Lo) <— listJtodree{Li,T),treedoJ,ist{T,Lo). 
insert{l, void, tr{l, void, void)). 

inserted, tr{E, L, R),tr{E, Ln, R)) ^ Q) I < E, insert{I, L, Ln). 
inserted, tr {E, L, R),tr(E, L, Rn)) ^07 >= E, insert(d, R, Rn). 

insert Jist{[dd\L],T,Tn) ^ insert{dd, T,Tm), insert Jist{L,Tm,Tn). 
insert Jist(\\,T, T). 

listJoJree{L,T) ^ insert Jist{L, void, T). 

tree-toJist{T, L) ^ treedoJ,istja,ux{T, [], I/). 

treeSoJistjmx(void, L, L). 
treejtoJistjOLUx{tr{d, L, R), O, N) <— 

tree-toJistjaux{R, O, Ll),treeSodistjmx[L, [7|7/1], N). 



Fig. 1. The treesort program from m Circled letters are not part of the pro- 
gram but locate program points. 



mode of X is m. A groundness analysis infers the following statements. 

© : (Li 1 -^ g) A (Lo g) © : (L g) A (A g) 

© : (Li 1 -^ g) A (Lo i-> u) © : (L g) A (A g) 

© : (Li 1 -^ u) A (Lo g) => © : (/ u) A (F i— > u) 

© : (Li 1 -^ u) A (Lo u) => © : (/ u) A (if u) 

These statements must be inferred independently from each other. The ground- 
ness of L and E at the point © depends on the groundness of Li and Lo at the 
point © in such a way that 7 and E are ground at point © iff Li is ground at 
point ©. Thus, it will be desirable to have a groundness analysis which infers 
the following statement 

© : (Li i-^- a) A (Lo fd) (c) : {I i—> a) A {E i—f a) (1) 

where a and (3 are groundness parameters ranging over MO. I 

Such an analysis is parametric in the sense that its input © : (Li i-^ a) A 
(Lo 1 -^ (3) and its output {I a) A {E a) are parameterized. 

Statement © can be instantiated as follows. When the parameters a and 
(3 are assigned groundness modes from MO, the groundness of Li and Lo at 
the point © and the groundness of 7 and E at the point © are obtained by 
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instantiation. The first statement inferred by the non-parametric groundness 
analysis is obtained from dB by assigning g to both a and /3. Instantiations can 
be made for four different assignments of groundness modes to a and (3. 

Statement CD can also be used to infer a sufficient condition on a and P 
under which the run-time check on the groundness of / and E in the built-in call 
I < E a,t the point ©can be safely removed. Specifically, if a is assigned g then 
the run-time check can be safely removed. With a non-parametric groundness 
analysis, one needs to analyze the program for four times to infer the sufficient 
condition. 

3 Preliminaries 

Lattice Theory. A poset is a tuple {A, C) where A is a set and C is a reflexive, 
anti-symmetric and transitive relation on A. Let B A and rt G A. rt is an upper 
bound of i? if 6 C M for each b £ B. u is a least upper bound of B if u Q u' for 
any upper bound u' of B. The least upper bound of B, if exists, is unique and 
denoted LiB. Lower bounds and the greatest lower bound are defined dually. r\B 
denotes the greatest lower bound of B. 

A complete lattice is a poset (A, C) such that LiB and nB exist for any 

def 

B C A. A complete lattice is denoted (A, C, T, T, n, U) where T = U0 and 

T n0. Let (A, C, T, T, n, U) be a complete lattice and B A. B is a, Moore 
family if T G B and □ X 2 ) G B for any x\ G B and X 2 G B. 

Let (A, and {B, be two posets. A function f : A 1 -^ B is monotonic 

if /(oi) Eb /(o 2 ) for any m G A and 02 G A such that ai Ea 02 - Let A C A. We 
d&f 

define /(A) = {f{x) \ x G A}. We sometimes use Church’s lambda notation 
for functions, so that a function / will be denoted \x.f{x). 



Logic Programming. Let A be a set of function symbols, 77 a set of predicate 
symbols, V a denumerable set of variables and U CV. The set of terms over 
A and U is the smallest set containing x in U and /(ti, • • • ,tn) with f jn G A, 
n > 0 and ti G for 1 < 7 < n. The set An,s.u of atoms over 77 and 

Ts^u consists of p{ti, ■ ■ ■ ,tn) with p/n G 77, n > 0 and ti G Ts,u for 1 < i < 
n. Let vars{0) denote the set of variables in O. A substitution is a mapping 
9 : V 1 -^ Tsy such that {x G V \ x denoted dom{9), is finite. The 

d^f 

range of 9 is range{9) = Uxedom{e)'vars{9{X)) . 9\'U is a substitution such 
that {9['U){x) = 9{x) for a; G 7/ and (0|W)(a;) = x for x ^ U. A substitution 
9 V ^ 7b, V is uniquely extended to a homomorphism 9 : 7b, v T^s,v- A 
renaming substitution is a bijective mapping from V to V. Let Sub be the set of 
idempotent substitutions. 

An equation is a formula I = r where either l,r G 7b, v or l,r G An.s,v- 
The set of all equations is denoted Eqn. For a set of equations E G p(Eqn), a 
unifier of 77 is a substitution such that 9(1) = 9(r) for each (I = r) G E. E is 
called unifiable if E has a unifier. A unifier 6* of A is a most general unifier if 
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for any other unifier a of E there is a substitution rj such that a = rj o 9 where 
o denotes function composition. All most general unifiers of E are equivalent 
modulo renaming. Let mgu : p(Eqn) Sub\J{faU} return either a most general 
unifier for if if if is unifiable or fail otherwise. mgu{{l = r}) is also written as 

mgu{l, r). Let 9 o fail fail and fail o 9 '^= fail for any 9 £ Sub U {fail}. 

Let VI be the set of variables of interest. VI is usually the set of the variables 
occurring in the program. We will use a fixed renaming substitution W such that 
VI) C\ VI = %. E \s called a tagging substitution in |2B|- 

Abstract Interpretation. Two semantics of the program are involved in ab- 
stract interpretation. One is called concrete and the other abstract. In a com- 
positional definition of semantics, the concrete semantics is defined in terms of 
a group of semantic functions /^ : i— > Ei and the abstract semantics is de- 
fined in terms of another group of semantic function ff : i-^ E^ such that 

each abstract semantic function ff simulates its corresponding concrete seman- 
tic function fi. To prove the correctness of the abstract semantics (the program 
analysis) with respect to the concrete semantics is reduced to proving the cor- 
rectness of each abstract semantic function ff with respect to its corresponding 
concrete semantic function fi. The latter can be done using the Moore fam- 
ily approach UHl when concrete domains Di and Ei are complete lattices. Let 
7^# : of 1-^ Di and 7^# : Ef 1— > Ei be monotonic functions such that 7£)it(L*f) 

and 7^# (iff) are Moore families. Then /f : 1-^ iff is correct with respect to 

fi'. Di^ El iff /j(7D#(a;*‘)) Qei 7i;«(/f (a^*)) for each x* S d\. 

Abstract Interpretation Framework. The abstract interpretation frame- 
work in ISH which we use to present the parametric groundness analysis is based 
on a concrete semantics of logic programs that is defined in terms of two opera- 
tors on (p{Sub), C). One is the set union U and the other is UNIFY defined as 
follows. Let ai,02 G An.s.vi and 0i,02 € p{Sub). 

UNIFY (ai, 01, 02, 02) = {unify{ai, 0i, 02, 6*2) 7^ fail | 0i G 0i A 6*2 G 02} 

where ttni/?/(ai, 0i, 02, ^2) = ’mgu{p{9i{ai)),92{a2)) o 02 and p is a renaming 
substitution satisfying (vars{9i) U vars{ai)) fl (vars(92) U vars(a2)) = 0. 

Specializing the framework for a program analysis consists in designing an 
abstract domain {ASub, □), a monotonic function jASub ■ ASub 1-^ p{Sub) such 
that 'yASub(ASub) is a Moore family and an abstract operator AUNIFY on 
{ASub^Vf) such that, for any 01,02 G An,s,vi and any 7ri,7T2 G ASub, 

UNIFY (oi, 7 as„ 6(7 Ti), 02, lASub{T^2)) C jASub{AUNIFY (oi, tti, 02, ^2)) 

since monotonicity of 7 as«& implies that -fASub{'!^i)V-jASubiTT2) Q lASub{'^i'ATT2) 
where U is the least upper bound operator on {ASub,Vf) . Elements of ASub 
are called abstract substitutions since they describe sets of substitutions. The 
abstract operator AUNIFY is called abstract unification operator as its main 
functionality is to simulate unification. 
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4 Non-parametric Groundness Analysis 

This section reformulates the groundness analysis presented in |4d| that uses the 
abstract domain for groundness proposed in m- The reformulated groundness 
analysis will be used in section El to obtain the parametric groundness analysis. 

4.1 Abstract Domain 

A set of substitutions is described by associating each variable in VI with a 
groundness mode from MO. The abstract domain is thus (Con,C(;Qn) where 

Con VI I— > MO and Ccon is the pointwise extension of <0 (Con,C(;on) is 
a complete lattice. The set of substitutions described by an abstract substitu- 
tion in Con is given by a function ycon : Con p(Sub) defined as follows. 

7Con(0“) = {e\yX€ Vl-He^X) = g) ^ (r;ars(0(A)) = 0)} 

7Con is a monotonic function from (Con, to (p{Sub), C). A substitution 9 

is said to satisfy an abstract substitution 9^ if 9 G 7Con(6'**)- 

The abstract unification operator for the non-parametric groundness analysis 
also deals with groundness of renamed variables. Let VI^ VI U I^{VI). We 

define Con^ vP ^ MO and 7j„,(6»“) = {9 \ VX G VI^ .{{9^{X) = g) ^ 
(vars(9(X)) = 0)}. 

Lemma 1. 7 con(Con) and 7 ^^j,,(Coni) Moore families. I 

4.2 Abstract Unification 

Algorithm n defines the abstract unification operator AUNIFYcon for the non- 
parametric groundness analysis. Given 9'^,a'^ G Con and ai,02 G An,s,vi, the 
renaming substitution F is first applied to oi and 9^ to obtain >f'(ai) and 
and F{9'^) and a‘^ are combined to obtain C** = I^{9'^) U aK Note that C** G Con^ 
and a substitution satisfying C** satisfies both F{9^) and cr**. i^o = mgu{'I' {a\) , 02 ) 
is then computed. If Eq = fail then the algorithm returns {A 1-^ g | A G 
VI} - the infimum of (Con,C(;on). Otherwise, the algorithm continues. 77** = 
DOWNcon{Eo,f^) is then computed. If a variable A occurs in t, (Y/t) in Eq 
and Y is ground in then A is ground in 77^*. Then = UPconivK Eq) is 
computed. If Y/t in Eq and all variables in t are ground in then Y is ground 
in pK The algorithm finally restricts to V7 and returns the result. 

Algorithm 1. Let 9'^ , cr^ G Con, oi , 02 G An,s, vi ■ 

AUNIFYco.{au9^,a2,a^) = 

{ let Eq = mgupF {a\) , 02 ) in 
*/ fail 

then UPcouiEo, DOWNcou{Eo,nO^) U a^))[VI 
else {X ^ g \ X G VI} 

^ An element / in Con is represented as {x G VI \ f{x) — g} in the literature. 
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DOWNco.{E,C^) XX. 



UPco.{E,7j^) XX. 



C“(X), ifX^range{E) 

C#(X) A {A(^Y/t)eEAXevar.s(t)C'^{y)), otherwise. 
ri'^{X), if X ^ dom{E) 

riKX) A (VY6„ars(iS(x))C“(>^)), otherwise. 



The following theorem states the correctness of the non-parametric ground- 
ness analysis. 

Theorem 1. For any 6'^,a^ G Con and any 01,02 G An,s,vi, 

UNIFY (oi, 7 con(^'*‘), 02, 7 Con(o'*)) C ^co„{AUNIFY con{ai,9\ 02, ct*)) 



5 Parametric Groundness Analysis 

The input and the output of the parametric groundness analysis by necessity 
contains a set Para of groundness parameters. They are instantiated after analy- 
sis by an assignment of groundness modes to groundness parameters - a function 
from Para to MO. Therefore, the parametric groundness analysis needs to prop- 
agate groundness information encoded by groundness parameters in such a way 
that instantiating its output by a groundness assignment k obtains the same 
groundness information as first instantiating its input by k and then performing 
the non-parametric groundness analysis. 



5.1 Abstract Domain 

We first consider how to describe groundness of a variable in the presence of 
groundness parameters. In the non-parametric groundness analysis, groundness 
of a variable is described by a groundness mode from MO. Propagation of ground- 
ness reduces to computing the least upper bounds and greatest lower bounds of 
groundness modes from MO. In the parametric groundness analysis, groundness 
descriptions of a variable contain parameters and hence the least upper bound 
and greatest lower bound of groundness descriptions cannot evaluated to an el- 
ement of MO or an element of Para during analysis. We resolve this problem 
by delaying the least upper bound and the greatest lower bound computations. 
This requires that groundness of a variable be described by an expression formed 
of elements of MO, elements of Para, the least upper bound operator V and the 
greatest lower bound operator A. It can be shown that 

Observation 1. Any expression formed as above is equivalent to an expression 
of the form Vig/(A^g j-crl) where aj G Para. I 

Expression Vig/(Ajg j.a^) is represented as a set S of subsets of ground- 
ness parameters. Let S = {Si,S 2 , • • • , S„}, Si = • • • , a^'}. S stands for 
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\7i<i<n{^i<j<kiOil) which is a function from (Para MO) to MO defined 

as S{k) \/i<i<n{^i<j<kiK{al)). For any k G (Para MO), 0(k) = g 
since V® = g and {0 }(k) = u as v(^0) = u. Thus, 0 and {0} represent 
modes g and u respectively. There may be two parametric groundness descrip- 
tions 5i and S 2 such that Si{k) = S 2 {k) for any k G (Para 1 -^ MO). We fol- 
low the normal practice in program analysis of identifying those descriptions 
that have the same denotation. Define relations <C and = on p(p(Para)) as 

V5i G S 1 . 3 S 2 G 52 . (^2 C Si) and 5i^52 (5i<52) A (52<5i). 

Then = is an equivalence relation on p(p(Para)). The domain of parametric 
groundness descriptions is (PMO, A) where 



PMO p(p(Para)) 



/= 



, def 

A =' 



(PMO, A) is a complete lattice with its infimum being [0]^^ and its supremum 
being [{0}]2,|. The least upper bound of [5i]g^ and [52]=,i is 0 [52]=^ = 

[5i U 52 ]=; and the greatest lower bound of and [52]=,i is 0 [‘52]=; = 

[{5i U ^2 I 5i G 5i A ^2 G 52 }]=,,. A parametric groundness description [5]^,, G 

PMO is a function from (Para h— > MO) to MO defined as [5]g^(K) S{k). 

In other words, a parametric groundness description is instantiated to a non- 
parametric groundness description - a groundness mode in MO by a groundness 
assignment. 

d^f 

Let |A| be the number of elements in set X and size([5]g,,) = It 

can be shown that 

Lemma 2. The height of PMO is and size([5]g,,) is 0(|Para|2l^^''^l) 

for any [5]^,, G PMO. I 

A parametric abstract substitution is a function that maps a variable in 

VI to a groundness description in PMO. The domain of parametric abstract 

d&f 

substitutions is (PCon, Cpcon) where PCon = VI PMO and Epcon is the 
pointwise extension of A. (PCon, Cp£(,„) is a complete lattice with its infimum 
being {x 1 -^ [0]^,, | x G VI}. A parametric abstract substitution 9^ G PCon can 

be thought of as a function from (Para MO) to Con defined as 9'^{k) AA G 
VI.{{9^{X)){k)), that is, a parametric abstract substitution is instantiated to a 
non-parametric abstract substitution by a groundness assignment. The meaning 
of a parametric abstract substitution is given by ypcon : PCon ((Para h— > 
MO) 1 -^- p(Sub)) defined as follows. 



7PCon(6'*) Xk.{9 I Vx G VI.{{6^x){k) = g) ^ {vars{9{x)) = 0))} 



Example 2. Let Para = {a, /?, 7 } and VI = {x, y, z}. 9^ = {x 1 -^ [{{cTj 7 }}]sj U 
[{{/3, 7 }}]si ,-2 [{{cr) 7 } 5 {/^) 7 }}]si} is a parametric abstract substitution and 
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7 PCon( 6 '**) is the following function from groundness assignments to sets of sub- 
stitutions. 

{a H- > g, d g, 7 I-+ g} 1 -^ {0 € Sub I vars{9{x)) = vars{9{y)) = vars{9{z)) = 0} 

{a I— > u, P I— > u, 7 i-^ u} i—» Sub 



Let PCon^ VI^ ^ PMO and 7,!.con(^“) I ^ VP .{{e^{x){K) = 

g) ^ {vars{0{x)) = 0))}. 

Lemma 3. 7 pcon(PCon) and 7 p(~Qj^(PCon^) are Moore families. I 



5.2 Abstract Unification 

Algorithm 0 defines an abstract unification operator for the parametric ground- 
ness analysis. It is obtained from that for the non-parametric groundness analysis 
by replacing non-parametric groundness descriptions with parametric ground- 
ness descriptions, V and A by 0 and 0 respectively, and renaming AUNIFYcon, 
DOWN Con and UPcon into AUNIFY pcon, DOWN pcon and UPpcon respectively. 

Algorithm 2. Let G PCon, ai,a 2 G An,s,vi- 

AUNIFYpcon{ai,e\a2,a^) 

{ let Eq = mgu (F(ai), 02 ) in 

if Eojt: fail 

then UP PConiEo, DOWNpcon{EoMO^) Oa^mVI 
else {X ^ [0]^ I A G V7} 



DOWNpconiEX^) = AA. 



UPpcon{E,v^) AA. 



C*(A), ifX^range{E) 

C“(A) 0 ({g>(v/t)6£;AXG«ar.(t)C“(A")), otherwise. 
77 ** (A), if A ^ dom{E) 

rf \x) 0 (0yg„ars(B(X)) C**(y)), otherwise. 



Lemma 4. The time complexity of AUNIFY pqo„ is 0(|L/|2|Para|22lP"'-='l) and 
that o/Upcon is 0{\ V7||Parap2l^^''^l) where Upcon is the least upper bound oper- 
ator on PCon. I 

Example 3. This example illustrates how AUNIFY pcon works. Let 

VI = {X,Y,Z} 

A = g{X,f{YJ{Z,Z)),Y) 

B = g{f{X,Y),Z,X) 

gi = {X ^ {{01,02}}, A 1-^ {{oi,O3}},0 1-^ {{a 2 ,a 3 }}} 
cr>* = {A 1-^ {{oi}, {02}}, A {{02,03}}, A 1-^ { 0 }} 
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Suppose W = {X 1-^ ATo, Y Yq,Z Zq}. We have 

tf'(A) =g{Xo,f{Yo,f{Zo,Zo)),Yo) 

= {Xo {{ai,a 2 }},i^o {{ai,a 3 }},-^o {{a 2 ,a 3 }}} 

= ^.( 6111 ) u cr« 

_ f -^0 {{c(i,a 2 }},Yo {{ai,a 3 }},Zo {{02, a 3 }}, 

“ \ X 1-^ {{«i}, {a 2 }}, Y ^ {{02, 03}}, { 0 } 



Eo = eqo mgu{E{A),B) = {Xq = /(Fq, Y), Z = f{Yo, /(Zq, Zq)), X = Fq} 



gi = DOWNpcon{Eo,C^) 

^ f Xo {{ai,a2}},Fo {{ai,Qf2,a3}},Zo {{a2,a3}},l 

\ a: 1 -^ {{ai},{a2}},F 1 -^ {{ai,a2,a3}},Z {0} j 

/ 3 “ = UPpcon{Eo,gi) 

^ f Xo {{ai,a 2 ,a 3 }},Fo {{ai,a 2 ,a 3 }},Zo {{02, a 3 }}, 

“ \ a: 1-^ {{ai,a2,Q!3}},F 1-^ {{oi, «2, as}}, Z 1-^ {{a 2 ,a 3 } 

Finally, 

AUNIFY pcon{A, 9 ^, B,a^) 

= f 3 ^l'VI 

= {X 1 -^ {{ai,a2,«3}},F 1 -^ {{ai,a2,a3}},Z 1 -^ {{a2,a3}}} 



The following theorem states that instantiating the output of the parametric 
groundness analysis by a groundness assignment obtains the same groundness 
information as first instantiating its input by the same groundness assignment 
and then performing the non-parametric groundness analysis. 

Theorem 2 . Let [ 52 ]=; € PMO, 9 ^,a^ e PCon, € PCon^, 01,02 G 

■^n,s,vi ond E e p(Eqn). For any k G (Para MO), 

(a) ([5i]2;(g)[52]£;)(K) = (5 i(k)A52(k)) ond ([5i]g,0[52]s;)(K) = (5 i(k)V52(k)); 

(b) {DOWNpco.{E,(^^)){k) = DOWNco.{E,C}{k)); 

(c) {UP PCon (E,g*)){K) = UPcon{E,g'^{K)); and 

(d) {AVNIFY pcon{ai, 9 '^ , 02 , (t'^)){k) = AUNIFY con{ai, 9 '^ {n), 02, (^'^{U)). 



The following theorem establishes the correctness of the parametric ground- 
ness analysis. 

Theorem 3 . For any 01,02 G An,s,vi, k G (Para MO), and 9 '^,a'^ G PCon, 
UNIFY {ai , 7PCon (d* ) (rt) , 02 , ypcon (o" ) (k) ) C ypcon ( A UNIFY PCon (oi , d* , 02 , 0“ ) ) (k) 
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6 Implementation 

We have implemented the parametric groundness analysis and the abstract in- 
terpretation framework in SWI-Prolog. The abstract interpretation framework 
is implemented using O’Keefe’s least fixed-point algorithm m- Both the ab- 
stract interpretation framework and the parametric groundness analysis are im- 
plemented as meta-interpreters using ground representations for program vari- 
ables and groundness parameters. 

6.1 An Example 

Example 4. The following is the permutation sort program from j45j (Chapters) 
and the result of the parametric groundness analysis. The sets are represented 
by lists. K I— > r is written as V /T in the results, a as alpha and j3 as beta. 
Program points marked (a) and (b) will be referred to later. 

"/. [Li/ [[alpha] ] ,Lo/[ [beta]]] (a) 

sort (Li ,Lo) . 

"/, [Li/ [ [alpha] ] ,Lo/ [ [alpha, beta] ] ] 

select (X, [XiXs] ,Xs) . 

"/• [X/ [ [alpha, beta] ] ,Xs/ [ [alpha] ] ] 
select (X, [Y I Ys] , [Y I Zs] ) 

"/. [X/[[beta]] ,Y/[ [alpha]] , Ys/ [ [alpha] ] ,Zs/[[]]] 
select(X,Ys,Zs) . 

"/• [X/ [ [alpha, bet a] ] , Y/ [ [alpha] ] , Ys/ [ [alpha] ] ,Zs/ [ [alpha] ] ] 

ordered ( [] ) . 

"/.[] 

ordered ( [X] ) . 

"/• [X/ [ [alpha, beta] ] ] 
orderedC [X,Y| Ys] ) 

"/• [X/ [ [alpha, beta] ] , Y/ [ [alpha, beta] ] , Ys/ [ [alpha, beta] ] ] (b) 

X=<Y, 

"/. [X/ [] ,Y/[] ,Ys/ [ [alpha, beta] ] ] 
ordered ( [Y I Ys] ) . 

"/.[X/ [],Y/[],Ys/[]] 

permutationCXs , [Z I Zs] ) 

”/.[Xs/[ [alpha]] ,Z/[[beta]] ,Zs/[[beta]] ,Ys/[[]]] 
select(Z,Xs,Ys) , 

"/, [Xs/ [ [alpha] ] , Z/ [ [alpha, beta] ] , Zs/ [ [beta] ] , Ys/ [ [alpha] ] ] 
permutation(Ys,Zs) . 

"/, [Xs/ [ [alpha] ] ,Z/ [ [alpha, bet a] ] ,Zs/ [ [alpha, bet a] ] , Ys/ [ [alpha] ] ] 
permutationC [],[]) 

"/.[] . 



sort(Xs,Ys) 

"/. [Xs/ [ [alpha] ] , Ys/ [ [beta] ] ] 
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permutationCXs , Ys) , 

"/, [Xs/ [ [alpha] ] , Ys/ [ [alpha, beta] ] ] 
ordered(Ys) . 

"/, [Xs/ [ [alpha] ] , Ys/ [ [alpha, beta] ] ] 

The top-level goal is sort{Li, Lo) and the input abstract substitution at pro- 
gram point (a) is {Li i— > {{q!}},Lo i— > {{/3}}}. It says that groundness mode of 
Li is a and that of Lo is (3. The parametric groundness analysis infers an ab- 
stract substitution for every other program points. The abstract substitution at 
program point (b) associates the parametric groundness description a A/3 with 
variables X , Y and Ys. The result can be instantiated by any of four groundness 
assignments in {a,/3} MO. Let k = {a ^ g, (3 ^ u} . Then k instantiates the 
input abstract substitution to {Li i— > g,Lo i— > u} and the abstract substitution 
at program point (b) to {X g,Y i-^ g, Ts i-^- g}. This indicates that if the 
goal sort{Li, Lo) is called with Li being ground then X, Y and Ys are ground 
when {X <=Y) is invoked. 

Since the abstract substitution at program point (b) maps both X and Y to 
aA/3, it is obvious that if either a or /3 is assigned g then X and Y are ground 
before the execution of X =<Y and the run-time groundness check at program 
point (b) can be eliminated. I 

6.2 Performance 

The SWI-Prolog implementation of the parametric groundness analysis has been 
tested with a set of benchmark programs. The experiments were done on an 
l.OGHz Dell Desktop running Windows 2000 Professional and SWI-Prolog 3.4.0. 

Tabled shows time performance of the implementation. All but the last row 
corresponds to a specific input. The input consists of a program, a goal and an 
input abstract substitution that specifies the groundness of the variables in the 
goal. The program and the goal are listed in the first and the third columns. The 
input abstract substitution associates each variable in the goal with a different 
groundness parameter. For instance, the abstract substitution for the first row 
is {X 1 -^- {{a}},y 1 -^ {{/3}}}- The second column lists the size of the program 
measured in the number of program points in the program. Each fact p is treated 
as a clause p <— true which has two program points. The fourth column is the 
time in seconds spent on the input. The last row gives the total size of the 
programs and the total time. 

Tabled indicates that the prototype parametric groundness analyzer spends 
an average of 1.72 seconds to process one thousand program points. This is an 
acceptable speed for most logic programs. We believe that there is still room for 
improving the time performance through a better implementation because both 
meta-programming and ground representation of variables significantly slow the 
prototype. 

The same table compares the performance of the parametric groundness anal- 
ysis with that of the non-parametric groundness analysis presented in m which 
uses a subset of VI as an abstract substitution. The subset contains those vari- 
ables that are definitely ground under all substitutions described by the abstract 
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Table 1. Performance of Parametric Groundness Analysis. 



Program 


Points 


Goal 


Poly 


Mono 


Ratio 


Assign- 








(sec) 


(sec) 




ments 


Buggy Quick Sort 


38 


qs(A, B) 


0.038 


0.016 


2.375 


4 


Exponentiation 


27 


exp(A, B, C) 


0.01 


0.009 


1.111 


8 


Factorial 


25 


factorial(A, B) 


0.008 


0.005 


1.6 


4 


Graph Connectivity 


50 


connected(A, B) 


0.012 


0.009 


1.333 


4 


Heapify Binary Trees 


27 


heapify (A, B) 


0.043 


0.015 


2.867 


4 


Improved Quick Sort 


22 


iqsort(A, B) 


0.025 


0.009 


2.778 


4 


Interchange Sort 


24 


sort (A, B) 


0.015 


0.005 


3 


4 


List Insertion 


23 


insert(A, B, G) 


0.01 


WlIlM 


1.25 


8 


Permutation Sort 


26 


sort (A, B) 


0.018 


0.008 


2.25 


4 


Quicksort with D-List 


22 


quicksort (A, B) 


0.027 


0.012 


2.25 


4 


Tree Sort 


34 


treesort(A, B) 


0.038 


0.014 


2.714 


4 


ann 


653 


go (A) 


0.911 


0.541 


1.684 


2 


asm 


904 


asm_PIL(A, B) 


1.188 


0.855 


1.389 


4 


boyer 


351 


tautology (A) 


0.269 


0.16 


1.681 


2 


browse 


132 


q 


0.073 


0.06 


1.217 


1 


chat 


1368 


chat .parser 


4.326 


2.554 


1.694 


1 


cs_r 


348 


pgenconfig(A) 


0.649 


0.42 


1.545 


2 


disj_r 


180 


top(A) 


0.103 


0.088 


1.17 


2 


dnf 


95 


go 


0.285 


0.239 


1.192 


1 


ga 


503 


test.ga 


0.541 


0.531 


1.019 


1 


gabriel 


131 


main(A, B) 


0.122 


0.072 


1.694 


4 


kalah 


298 


play(A, B) 


0.215 


0.144 


1.493 


4 


life 


115 


lift(A, B, G, D) 


0.04 


0.04 


1 


16 


mastermind 


238 


play 


0.13 


0.1 


1.3 


1 


met a 


110 


interpret(A) 


0.139 


0.073 


1.904 


2 


nand 


624 


main (A) 


1.117 


0.921 


1.213 


2 


naughts_and_crosses 


137 


play(A) 


0.067 


0.042 


1.595 


2 


nbody 


454 


go(A, B) 


0.404 


0.235 


1.719 


4 


neural 


382 


test (A, B) 


0.257 


0.119 


2.16 


4 


peep 


541 


comppeepopt(A, B, G) 


1.07 


0.538 


1.989 


8 


press 


455 


test_press(A, B) 


1.624 


0.626 


2.594 


4 


queens 


33 


queens(A, B) 


0.01 


0.007 


1.429 


4 


read 


500 


read(A, B) 


1.863 


1.219 


1.528 


4 


reducer 


408 


try(A, B) 


0.719 


0.426 


1.688 


4 


ronp 


110 


puzzle(A) 


0.097 


0.05 


1.94 


2 


sdda 


355 


dojdda(test, A, B, G) 


0.462 


0.23 


2.009 


8 


semi 


216 


go(A, B) 


0.773 


0.382 


2.024 


4 


serialize 


50 


go (A) 


0.083 


0.033 


2.515 


2 


simple_analyzer 


560 


main (A) 


0.716 


0.39 


1.836 


2 


tictactoe 


286 


play(A) 


0.34 


0.263 


1.293 


2 


tree order 


39 


v2t(A, B, C) 


0.038 


0.021 


1.81 


8 


tsp 


153 


tsp(A, B, C, D, E) 


0.164 


0.087 


1.885 


32 


zebra 


64 


zebra(A, B, C, D, E, F, G) 


0.048 


0.025 


1.92 


128 




mil 




19.087 




1.78 


7.4 
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substitution. This allows operators on abstract substitutions to be optimized. 
The non-parametric groundness analysis is implemented in the same way as the 
parametric groundness analysis. 

The number of different groundness assignments for the parametric ground- 
ness analysis is two to the power of the number of groundness parameters. Each 
assignment corresponds to a non-parametric groundness analysis that is per- 
formed and measured. The fifth column lists the average time in seconds spent 
on these non-parametric groundness analyses. The sixth column lists the ratio 
of the fourth column and the fifth column. The seventh column lists the number 
of groundness assignments. 

The table shows that the time the parametric groundness analysis takes is 
from 1.0 to 3.0 times that the non-parametric groundness analysis takes. On 
average, the parametric groundness analysis is 78% slower. This is due to the 
fact that the parametric groundness descriptions are more complex than the 
non-parametric groundness descriptions. The abstract unification operator and 
the least upper bound operator for the parametric groundness are more costly 
than those for the non-parametric groundness analysis. 

The result of the parametric groundness analysis is much more general than 
that of the non-parametric groundness analysis. It can be instantiated as many 
times as there are different groundness assignments. The average number of 
groundness assignments is 7.4 which is 4.2 times the average performance ratio 
1.78. In order to derive a sufficient condition for safely removing groundness 
checks for builtin calls, the non-parametric groundness analysis must be run as 
many times as the number of groundness assignments. In this case, the para- 
metric groundness analysis is 4.2 times better. 



7 Related Work 



The parametric groundness analysis has been obtained from a non-parametric 
groundness analysis that uses a simple groundness domain. As groundness is use- 
ful both in compile-time program optimizations and in improving the precision 
of other program analyses, more powerful groundness domains have been stud- 
ied. These domains consists of propositional formulae over program variables 
that act as propositional variables. Dart uses the domain Def of definite propo- 
sitional formulae to capture groundness dependency between variables |Z2|. For 
instance, the definite propositional formula a; <— {y A z) represents the ground- 
ness dependency that x is bound to a ground term if y and z are bound to 
ground terms. Def consists of propositional formulae whose models are closed 
under set intersection HH. Marriott and Spndergaard use the domain Pos (also 
called Prop) of positive propositional formulae |231. A propositional formula / is 
positive if / is true when all propositional variables in / are true. Pos is strictly 
more powerful than Def. It has been further studied in |I7EI3j and has several 
implementations |29I25I8B] . 

Giacobazzi and Scozzari reconstruct Def and Pos from Con via Heyting com- 
pletion where Con is a subdomain of Pos and consists of propositional 
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formulae that are conjunctions of propositional variables. The Sharing domain 
proposed by Jacobs and Langen for sharing analysis also contains ground- 

ness information. Cortesi et. al prove that groundness information contained in 
Sharing is exactly that captured by Def . Codish and Spndergaard recently 

discover that Sharing is isomorphic to Pos in structure nni. 

Pos-based analyzers using binary decision diagrams have been shown to 
be precise and efficient for benchmark programs. However, Pos-based analyzers 
do not come with any efficiency guarantee as they require in the worst case 
exponential number of iterations or exponentially large data structures HH. 
More abstract domains have been proposed, offering different trade- 

offs between the precision and the efficiency of analysis. 

Pos-based goal-independent groundness analyzers enjoys a favorable prop- 
erty of being condensing IZZEH- An analysis F that infers output information 
F{P, (p) from a program P and input information p is condensing if F{P, pAip) = 
F{P, (/))A'0 for any P, p and ip. Thus, a condensing analysis can be performed with 
partial input information p and its output be conjoined with additional input 
information tp to obtain the output that would result from analyzing the pro- 
gram with complete input information cpAip. Thus, a Pos-based goal-independent 
groundness analysis is also parametric since its result can be instantiated by logic 
conjunction. m and 0 present two approaches to perform condensing goal- 
independent groundness analysis using program transformation and bottom-up 
evaluation. An atomic call in the transformed program in m contains both 
variables of interest at a program point in the original program and variables in 
the query. Thus, the abstract domain for goal-independent groundness analysis 
in 1^ is PoS(pgrau w) since variables in the query play the role of groundness 
parameters. Similar argument can be made of Q. 

Groundness analyzers in )1!5I17| use Pos with top-down abstract interpreta- 
tion frameworks to perform goal-dependent groundness analysis 125177] . These 
analyzers project a Pos formula onto variables occurring in the clause to which 
the program point belongs. This makes them fail to capture groundness de- 
pendency between variables at a program point and variables in a query. Let 
Posx denote the set of positive Boolean functions over X - the set of proposi- 
tional variables. The following fix should make a top-down Pos-based ground- 
ness analysis condensing and hence parametric. The abstract domain Pos^/j is 
extended to PoS(parau w) the projection operation Xf3vi-f is replaced with 
A/.3 

(ParaUW)-/- 

Though Pos-based groundness analysis is parametric and more precise than 
the parametric groundness analysis, the parametric groundness analysis is more 
efficient. The cost of an analysis is determined by the number of iterations 
performed and cost of operations performed in each iteration. The height of 
PoS(pai-au w) (abstract domain in a Pos-based goal-dependent groundness analy- 
sis) is 0(2|V/|2|Para|) pj^ 

The height of PCon (abstract domain in the parametric 
groundness analysis) is 0(| V7|2l^^'’^l). Therefore, the number of iterations per- 
formed in the parametric groundness analysis is much less than those performed 
in a Pos-based groundness analysis. Abstract operations AUNIFY pcon and Upcon 
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are also much less expensive than V, A and existential quantification on posi- 
tive Boolean functions over (Para U VI) in a Pos-based groundness analysis. 
Therefore, the parametric groundness analysis is more efficient than a Pos-based 
groundness analysis. Furthermore, the parametric groundness analysis has the 
same asymptotic time and space complexity as a Con-based groundness analysis 
when it is used with Para = 0 . Thus, it only pays more cost than a Con-based 
analysis when it infers more general results. 



append[[ ],L,L). 

append{{H\Li], L2,[H\L3\) ^ append{L\, L2, Li).@ 
<— © append(Xs, Fs,^s).© 



Fig. 2. The append Program. 



It is interesting to note that the parametric groundness analysis also captures 
some groundness dependency among variables. 

Example 5 . For the program and the goal in FigureEl the parametric groundness 
analysis infers 

© : {Xs 1-^ {{a}}, Ys ^ {{/ 3 }}, Zs ^ {{7}}} 

^ ©: {Xs 1-^ {{a,7}},Fs 1-^ {{P,'y}},Zs ^ {{a, 7}, {/ 3 . 7}}} 

This implies that whenever Xs and Ys are bound to ground terms at point 
©, Zs is bound to a ground term at the same point. In order to bind Xs to 
a ground term, g must be assigned to either a or 7. In order to bind Ys to a 
ground term, g must be assigned to either /3 or 7. Any groundness assignment 
satisfying the above two conditions will evaluate {{a, 7}, {/ 3 , 7}}} to g. So, we 
have Xs A Ys Zs in Pos. Similarly, we can infer Xs A Ys Zs. I 

In general, if the abstract substitution at a program point assign TZj to Yj for 
1 < j < ^ and Si to Xi for 1 < i < k and (BiKjKiIij ^ ©i<i<fc5fe then the Pos 
like proposition A\<i<kXi Si<j<iYj holds at the program point. Thus the 
parametric groundness analysis also captures groundness dependency between 
program variables. However, the degree to which the parametric groundness 
analysis captures this kind of groundness dependency is limited. In particular, 
when Para = 0 , the parametric groundness analysis degenerates to the non- 
parametric groundness analysis which does not capture this kind of groundness 
dependency. 

There have also been effort in analyzing logic programs to discover type 
dependency between program variables PM- Though groundness modes in 
MO can be thought of as types, it is not beneficial to apply a type depen- 
dency analysis to infer groundness dependency. Abstract domains for type de- 
pendency analyses in mm are more complex and hence abstract operations 
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are more costly than those required in a groundness dependency analysis. Fur- 
thermore, their abstract domains have infinite increasing chains and they must 
employ a widening operator. m obtains a parametric type analysis from a non- 
parametric type analysis. Since types have much rich structures than groundness 
modes, equational constraints over parametric types need be incorporated into 
its abstract domain in order to propagate precisely type dependency. This makes 
abstract operations costly. As there are infinite number of assignments of types 
to type parameters, loss of precision is incurred when abstract operations in 
the parametric type analysis mimicks those in the non-parametric type analysis. 
The parametric groundness analysis presented in this paper has a much simpler 
abstract domain and abstract operations that mimicks precisely those in the 
non-parametric groundness analysis. 

8 Conclusion 

We have presented a new groundness analysis, called parametric groundness 
analysis, that infers groundness of variables parameterized by groundness param- 
eters that can be instantiated after analysis. The parametric groundness analysis 
is obtained by generalizing a non-parametric groundness analysis. Experimen- 
tal results with a prototype implementation of the analysis are promising. The 
parametric groundness analysis is as precise as the non-parametric groundness 
analysis. 

The parametric groundness analysis is theoretically faster but less precise 
than a Pos based groundness analysis. As future work, we would like to compare 
experimentally the time and the precision of the parametric groundness analysis 
with those of Pos based groundness analyses. 
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Abstract. Logic languages based on the theory of rational, possibly in- 
finite, trees have much appeal in that rational trees allow for faster unifi- 
cation (due to the omission of the occurs-check) and increased expressiv- 
ity. Note that cyclic terms can provide a very efficient representation of 
grammars and other useful objects. Unfortunately, the use of infinite ra- 
tional trees has problems. For instance, many of the built-in and library 
predicates are ill-defined for such trees and need to be supplemented by 
run-time checks whose cost may be significant. Moreover, some widely- 
used program analysis and manipulation techniques are only correct for 
those parts of programs working over finite trees. It is thus important to 
obtain, automatically, a knowledge of those program variables (the finite 
variables) that, at the program points of interest, will always be bound 
to finite terms. For these reasons, we propose here a new data-flow anal- 
ysis that captures such information. We present a parametric domain 
where a simple component for recording hnite variables is coupled with 
a generic domain (the parameter of the construction) providing sharing 
information. The sharing domain is abstractly specified so as to guar- 
antee the correctness of the combined domain and the generality of the 
approach. 



1 Introduction 

The intended computation domain of most logic-based language|i| includes the 
algebra (or structure) of finite trees. Other (constraint) logic-based languages, 
such as Prolog II and its successors SICStus Prolog Eg, and Oz IHl, refer 

to a computation domain of rational trees. A rational tree is a possibly infinite 
tree with a finite number of distinct subtrees and, as is the case for finite trees, 
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where each node has a finite number of immediate descendants. These properties 
will ensure that rational trees, even though infinite in the sense that they admit 
paths of infinite length, can be finitely represented. One possible representation 
makes use of connected, rooted, directed and possibly cyclic graphs where nodes 
are labeled with variable and function symbols as is the case of finite trees. 

Applications of rational trees in logic programming include graphics m, 
parser generation and grammar manipulation [TTin , and computing with finite- 
state automata m- Other applications are described in m and m- Going from 
Prolog to CLP, 1^ combines constraints on rational trees and record structures, 
while the logic-based language Oz allows constraints over rational and feature 
trees M- The expressive power of rational trees is put to use, for instance, in 
several areas of natural language processing. Rational trees are used in imple- 
mentations of the HPSG formalism (Head-driven Phrase Structure Grammar) 
in the ALE system (Attribute Logic Engine) 0, and in the ProFIT system 
(Prolog with Features, Inheritance and Templates) [1 t)j . 

While rational trees allow for increased expressivity, they also come equipped 
with a surprising number of problems. As we will see, some of these problems 
are so serious that rational trees must be used in a very controlled way, disal- 
lowing them in any context where they are “dangerous” . This, in turn, causes 
a secondary problem: in order to disallow rational trees in selected contexts one 
must first detect them, an operation that may be expensive. 

The first thing to be aware of is that almost any semantics-based program 
manipulation technique developed in the field of logic programming — whether 
it be an analysis, a transformation, or an optimization — assumes a computation 
domain of finite trees. Some of these techniques might work with the rational 
trees but their correctness has only been proved in the case of finite trees. Others 
are clearly inapplicable. Let us consider a very simple Prolog program: 

listen). 

list([_|T]) :- list(T). 

Most automatic and semi-automatic tools for proving program termination and 
for complexity analysis agree on the fact that list /I will terminate when in- 
voked with a ground argument. Gonsider now the query 

?- X = [alX] , list(X) . 

and note that, after the execution of the first rational unification, the variable 
X will be bound to a rational term containing no variables, i.e., the predicate 
list/1 will be invoked with X ground. However, if such a query is given to, say, 
SIGStus Prolog, then the only way to get the prompt back is by pressing "C. 
The problem stems from the fact that the analysis techniques employed by these 
tools are only sound for finite trees: as soon as they are applied to a system 
where the creation of cyclic terms is possible, their results are inapplicable. The 
situation can be improved by combining these termination and/or complexity 
analyses by a finiteness analysis providing the precondition for the applicability 
of the other techniques. 
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The implementation of built-in predicates is another problematic issue. In- 
deed, it is widely acknowledged that, for the implementation of a system that 
provides real support for the rational trees, the biggest effort concerns proper 
handling of built-ins. Of course, the meaning of ‘proper’ depends on the actual 
built-in. Built-ins such as copy_term/2 and ==/2 maintain a clear semantics 
when passing from finite to rational trees. For others, like sort/2, the extension 
can be questionable 2 both raising an exception and answering Y = [a] can be 
argued to be “the right reaction” to the query 

?- X = [alX] , sortCX, Y) . 

Other built-ins do not tolerate infinite trees in some argument positions. A good 
implementation should check for finiteness of the corresponding arguments and 
make sure “the right thing” — failing or raising an appropriate exception — al- 
ways happens. However, such behavior appears to be uncommon. A small ex- 
periment we conducted on six Prolog implementations with queries like 

?- X = 1+X, Y is X. 

?- X = [97 lx] , name(Y, X) . 

?- X = [XlX] , Y =. . [fix], 

resulted in infinite loops, memory exhaustion and/or system thrashing, segmen- 
tation faults or other fatal errors. One of the implementations tested, SICStus 
Prolog, is a professional one and implements run-time checks to avoid most cases 
where built-ins can have catastrophic effects^ The remaining systems are a bit 
more than research prototypes, but will clearly have to do the same if they evolve 
to the stage of production tools. Again, a data-flow analysis aimed at the de- 
tection of those variables that are definitely bound to finite terms would allow 
to avoid a (possibly significant) fraction of the useless run-time checks. Note 
that what has been said for built-in predicates applies to libraries as well. Even 
though it may be argued that it is enough for programmers to know that they 
should not use a particular library predicate with infinite terms, it is clear that 
the use of a “safe” library, including automatic checks which ensure that such 
predicates are never called with an illegal argument, will result in more robust 
systems. With the appropriate data-flow analyses, safe libraries do not have to 
be inefficient libraries. 

Another serious problem is the following: the ISO Prolog standard term 
ordering cannot be extended to rational trees [M. Carlsson, Personal commu- 
nication, October 2000]. Consider the rational trees defined by A = f(B, a) 
and B = f (A, b). Clearly, A == B does not hold. Since the standard term or- 
dering is total, we must have either A @< B or B @< A. Assume A @< B. Then 
f(A, b) @< f(B, a), since the ordering of terms having the same principal 
functor is inherited by the ordering of subterms considered in a left-to-right 
fashion. Thus B @< A must hold, which is a contradiction. A dual contradiction 

^ Even though sort/2 is not required to be a built-in by the standard, it is offered as 
such by several implementations. 

® SICStus 3.8.5 still loops on ?- X = [97 1 X] , name(Y, X) . 
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is obtained by assuming B @< A. As a consequence, applying one of the Prolog 
term-ordering predicates to one or two infinite terms may cause inconsistent 
results, giving rise to bugs that are exceptionally difficult to diagnose. For this 
reason, any system that extends ISO Prolog with rational trees ought to detect 
such situations and make sure they are not ignored (e.g., by throwing an ex- 
ception or aborting execution with a meaningful message). However, predicates 
such as the term-ordering ones are likely to be called a significant number of 
times, since they are often used to maintain structures implementing ordered 
collections of terms. This is another instance of the efficiency issue mentioned 
above. 

In this paper, we present a parametric abstract domain for finite-tree analysis, 
denoted hy H x P. This domain combines a simple component H (the finite- 
ness component), recording the set of definitely finite variables, with a generic 
domain P (the parameter of the construction), providing sharing information. 
The term “sharing information” is to be understood in its broader meaning, 
which includes variable aliasing, groundness, linearity, freeness and any other 
kind of information that can improve the precision on these components, such 
as explicit structural information. Several domain combinations and abstract 
operators, characterized by different precision/complexity trade-offs, have been 
proposed to capture these properties (see p] for an account of some of them). 
By giving a generic specification for this parameter component, in the style of 
the open product construct proposed in it is possible to define and establish 
the correctness of the abstract operators on the finite-tree domain independently 
from any particular domain for sharing analysis. 

The paper is structured as follows. The required notations and preliminary 
concepts are given in Section ^ The finite-tree domain is then introduced in 
Section 01 Section 1.8.11 provides the specification of the parameter domain P; 
Section 18. til defines the abstraction function for the finiteness component H ; 
Section ISI defines the abstract unification operator for P x P. A description of 
some ongoing work on the subject is given in Section 0 where a possible instance 
of the parameter P is also specified. We conclude in Section 0 

A longer version of this paper with proofs of the results presented here is 
available as a technical report 0 . 

2 Preliminaries 

2.1 Infinite Terms and Substitntions 

For a set S, p{S) is the powerset of S, whereas pf{S) is the set of all the finite 
subsets of S. Let Sig denote a possibly infinite set of function symbols, ranked 
over the set of natural numbers. It is assumed that Sig contains at least one 
function symbol having rank 0 and one having rank greater than 0. Let Vars 
denote a denumerable set of variables, disjoint from Sig. Then Terms denotes the 
free algebra of all (possibly infinite) terms in the signature Sig having variables 
in Vars. Thus a term can be seen as an ordered labeled tree, possibly having 
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some infinite paths and possibly containing variables: every inner node is labeled 
with a function symbol in Sig with a rank matching the number of the node’s 
immediate descendants, whereas every leaf is labeled by either a variable in Vars 
or a function symbol in Sig having rank 0 (a constant). 

If t G Terms then vars(t) and mvars(t) denote the set and the multiset of 
variables occurring in t, respectively. We will also write vars(o) to denote the set 
of variables occurring in an arbitrary syntactic object o. If a occurs more than 
once in a multiset M we write a € M. 

Suppose s,t G Terms: s and t are independent if vars(s) n vars(t) = 0; if 
y G vars(t) and ^(y G mvars(t)) we say that variable y occurs linearly in t, 
more briefly written using the predication occJin(y, t); t is said to be ground 
if vars(t) = 0; t is free if t G bars; t is linear if, for all y G vars(t), we have 
occJin(y, t); finally, t is a finite term (or Herhrand term) if it contains a finite 
number of occurrences of function symbols. The sets of all ground, linear and 
finite terms are denoted by GTerms, LTerms and HTerms, respectively. As we 
have specified that Sig contains function symbols of rank 0 and rank greater 
than 0, GTerms H HTerms yf 0 and GTerms \ HTerms yf 0. 

A substitution is a total function cr: Vars HTerms that is the identity 
almost everywhere; in other words, the domain of a, 

dom((r) { a: G Vars | a(x) yf a: }, 

is finite. Given a substitution a: Vars —> HTerms, we overload the symbol ‘cr’ 
so as to denote also the function a: HTerms HTerms defined as follows, for 
each term t G HTerms: 

{ t, if t is a constant symbol; 

a(t), if t G Vars; 

/(cr(ti),...,cr(t„)), if t= f{ti,...,tn). 

If a; G Vars and t G HTerms \ {a:}, then a; t is called a binding. The set 

of all bindings is denoted by Bind. Substitutions are denoted by the set of their 
bindings, thus a substitution a is identified with the (finite) set 

{ a; (j{x) I X G dom(cr) }. 

We denote by vars(cr) the set of variables occurring in the bindings of a. 

A substitution is said to be circular if, for n > 1, it has the form 

{a;i 1-^ a: 2 , . ■ • , a;„_i x^, a;„ xi}, 

where xi, . . . , Xn are distinct variables. A substitution is in rational solved form 
if it has no circular subset. The set of all substitutions in rational solved form is 
denoted by RSubst. 

If t G HTerms, we write ta to denote cr(t) and t[x/ s] to denote t{x s}. 
The composition of substitutions is defined in the usual way. Thus t o cr is 
the substitution such that, for all terms t G HTerms, 

(r ocr)(t) = r(cr(t)) 
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and has the formulation 

Toa = { a; xar | x G dom(cr), a: ^ xar }u{a;i-^xr|a;G dom(T) \ dom(cr) }. 

As usual, (7° denotes the identity function (i.e., the empty substitution) and, 
when z > 0, O'* denotes the substitution {a o 

For each cr G RSuhst, s G HTerms, the sequence of finite terms 

cr°(s),cr^(s),cr^(s),... 

converges to a (possibly infinite) term, denoted Therefore, the 

function rt: HTerms x RSubst Terms such that 

rt(s,cr) 

is well defined. Note that, in general, this function is not a substitution: while 
having a finite domain, its “bindings” x ^ t can map a domain variable x into 
a term t G Terms \ HTerms. 



2.2 Equations 



An equation is of the form s = t where s,t G HTerms. Eqs denotes the set of all 
equations. A substitution a may be regarded as a finite set of equations, that is, 
as the set {x = t \ x t G a}. We say that a set of equations e is in rational 
solved form if{si-^t|(s = t)Ge}G RSubst. In the rest of the paper, we will 
often write a substitution a G RSubst to denote a set of equations in rational 
solved form (and vice versa). 

Languages such as Prolog II, SICStus and Oz are based on TZT, the theory 
of rational trees unnu. This is a syntactic equality theory (i.e., a theory where 
the function symbols are uninterpreted), augmented with a uniqueness axiom 
for each substitution in rational solved form. Informally speaking these axioms 
state that, after assigning a ground rational tree to each non-domain variable, 
the substitution uniquely defines a ground rational tree for each of its domain 
variables. Thus, any set of equations in rational solved form is, by definition, 
satisfiable in 'RfT . Note that being in rational solved form is a very weak property. 
Indeed, unification algorithms returning a set of equations in rational solved form 
are allowed to be much more “lazy” than one would usually expect. We refer the 
interested reader to j2YI2?Si;iUj for details on the subject. 

Given a set of equations e G p({Eqs) that is satisfiable in TZT, a substitution 
a G RSubst is called a solution for e in TZT if TZT h V((t ^ e), i.e., if every 
model of the theory TZT is also a model of the first order formula V((t ^ e). If in 
addition vars(cr) C vars(e), then cr is said to be a relevant solution for e. Finally, 
cr is a most general solution for e in TZT if TZT h V(cr ^ e). In this paper, the set 
of all the relevant most general solution for e in TZT will be denoted by mgs(e). 
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2.3 The Concrete Domain 

Throughout the paper, we assume a knowledge of the basic concepts of abstract 
interpretation theory [TMHj . 

For the purpose of this paper, we assume a concrete domain constituted by 
pairs of the form (S, V), where 1^ is a finite set of variables of interest and S is 
a (possibly infinite) set of substitutions in rational solved form. 

Definition 1. (The Concrete Domain. ) Let p{RSubst) x pf{Vars). 

If {S,V) € , then {S,V) represents the (possibly infinite) set of first-order 

formulas { BA . a \ a & S ^ A = vars(cr) \ 1^ } where a is interpreted as the logieal 
conjunction of the equations corresponding to its bindings. 

Concrete domains for constraint languages would be similar. If the analyzed 
language allows the use of constraints on various domains to restrict the values 
of the variable leaves of rational trees, the corresponding concrete domain would 
have one or more extra components to account for the constraints (see [ 2 | for an 
example) . 

The concrete element ({{a; /(y)}}i 2/}) expresses a dependency be- 
tween X and y. In contrast, ({{a; /(2 /)}}j { 2 ^}) only constrains x. The same 

concept can be expressed by saying that in the first case the variable name ‘y’ 
matters, but it does not in the second case. Thus, the set of variables of interest 
is crucial for defining the meaning of the concrete and abstract descriptions. 
Despite this, always specifying the set of variables of interest would significantly 
clutter the presentation. Moreover, most of the needed functions on concrete and 
abstract descriptions preserve the set of variables of interest. For these reasons, 
we assume the existence of a set VI G pf(Vars) that contains, at each stage 
of the analysis, the current variables of interest o As a consequence, when the 
context makes it clear that S G p(RSubst)., we will write A G I?*' as a shorthand 
for (A, VI) G Vfi 

3 An Abstract Domain for Finiteness Analysis 

Finite-tree analysis applies to logic-based languages computing over a domain 
of rational trees where cyclic structures are allowed. In contrast, analyses aimed 
at occurs-check reduction apply to programs that are meant to compute 

on a domain of finite trees only, but have to be executed over systems that are 
either designed for rational trees or intended just for the finite trees but omit the 
occurs-check for efficiency reasons. Despite their different objectives, finite-tree 
and occurs-check analyses have much in common: in both cases, it is important 
to detect all program points where cyclic structures can be generated. 

^ This parallels what happens in the efficient implementation of data-flow analyzers. 
In fact, almost all the abstract domains currently in use do not need to represent 
explicitly the set of variables of interest. In contrast, this set is maintained externally 
and in a unique copy, typically by the fixpoint computation engine. 
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Note however that, when performing occurs-check reduction, one can take 
advantage of the following invariant: all data structures generated so far are 
finite. This property is maintained by transforming the program so as to force 
finiteness whenever it is possible that a cyclic structure could have been built 0 
In contrast, a finite-tree analysis has to deal with the more general case when 
some of the data structures computed so far may be cyclic. It is therefore natural 
to consider an abstract domain made up of two components. The first one simply 
represents the set of variables that are guaranteed not to be bound to infinite 
terms. We will denote this finiteness eomponent by H (from Herbrand) . 

Definition 2. (The Finiteness Component.) The finiteness component is 
the set H p( VI) partially ordered by reverse subset inclusion. 

The second component of the finite-tree domain should maintain any kind of 
information that may be useful for computing finiteness information. 

It is well-known that sharing information as a whole, therefore including 
possible variable aliasing, definite linearity, and definite freeness, has a crucial 
role in occurs-check reduction so that, as observed before, it can be exploited 
for finite-tree analysis too. Thus, a first choice for the second component of the 
finite-tree domain would be to consider one of the standard combinations of 
sharing, freeness and linearity as defined, e.g., in mm- However, this would 
tie our specification to a particular sharing analysis domain, whereas the overall 
approach seems to be inherently more general. For this reason, we will define a 
finite-tree analysis based on the abstract domain schema HxP, where the generic 
sharing eomponent P is a parameter of the abstract domain construction. This 
approach can be formalized as an application of the open product operator ini 

3.1 The Parameter Component P 

Elements of P can encode any kind of information. We only require that substi- 
tutions that are equivalent in the theory TZT are identified in P. 

Definition 3. (The Parameter Component.) The parameter component P 
is an abstract domain related to the concrete domain T>^ by means of the con- 
cretization function "fp: P p(RSubst) such that, for all p G P , 



The interface between H and P is provided by a set of predicates and func- 
tions that satisfy suitable correctness criteria. Note that, for space limitations, 
we will only specify those abstract operations that are useful to define abstract 
unification on the combined domain HxP. The other operations needed for a 
full description of the analysis, such as renamings, upper bound operators and 
projections, are very simple and, as usual, do not pose any problems. 

® Such a requirement is typically obtained by replacing the unification with a call to 
unit y_with_occurs_check/2. As an alternative, in some systems based on rational 
trees it is possible to insert, after each problematic unification, a finiteness test for 
the generated term. 
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Definition 4. (Abstract Operators on P.) Let s,t G HTerms he finite 
terms. For each p G P, we define the following predicates: 

s and t are independent in p if and only if indj, : HTerms^ Bool holds for 
(s,t), where 

indp(s,t) V(T G 7p(p) : vars(rt(s, cr)) n vars(rt(t, a)) = 0; 

s and t share linearly in p if and only z/ share Jin^ : HTerms^ — > Bool holds for 
(s,t), where 

shareJinp(s, t) Vct G 1p{p) '■ 

Vy G vars(rt(s, (t)) n vars(rt(t, cr)) : 

occJin(y, rt(s, cr)) A occJin(?/, rt(t, f^)); 

t is ground in p if and only z/groundp : HTerms — > Bool holds for t, where 

groundp(t) Vcr G 7p(p) : rt(t, cr) G GTerms; 

t is ground-or-free in p if and only z/gfreej, : HTerms Bool holds for t, where 

gfreep(t) Vcr G 7p(p) : rt(t, cr) G GTerms V rt(t, cr) G Vars; 

s and t are or-linear in p if and only if or Aiup : HTerms"^ Bool holds for (s, f), 
where 



orJinp(s,t) Vcr G 7p(p) : rt(s, cr) G LTerms V rt(t, cr) G LTerms; 
s is linear in p if and only if lirip : HTerms Bool holds for s, where 

linp(s) orJinp(s, s). 

For eachp G P , the following functions compute subsets of the set of variables 
of interest: 

the function share_same_varj, : HTerms x HTerms p{VI) returns a set of 
variables that may share with the given terms via the same variable. For each 
s,t G HTerms, 





f 


3cr G 7 p(p) ■ 


share_same_varj,(s, t) 0 < 


!/£ W 


3z G vars(rt(y, cr)) . > 

z G vars(rt(s, cr)) n vars(rt(t, cr)) 



the function share_withp : HTerms p{VI) yields a set of variables that may 
share with the given term. For each t G HTerms, 

share_withp (t) { y G VI \ y G share^ame_varp(y, t) }. 
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The function amgup : P x Bind P correctly captures the effects of a binding 
on an element of P . For each (x t) € Bind and p G P , let 

p' amgup(p,a; 1). 

For all a G 7p(p), if t G mgs(cr U {a; = i}), then t G 'yp(p'). 

As it will be shown in Section some of these generic operators can be di- 
rectly mapped into the corresponding abstract operators defined for well-known 
sharing analysis domains. However, the specification given in Definition 0 be- 
sides being more general than a particular implementation, also allows for a 
modular approach when proving correctness results. 



3.2 The Abstraction Function for H 

When the concrete domain is based on the theory of finite trees, idempotent 
substitutions provide a finitely computable strong normal form for domain ele- 
ments, meaning that different substitutions describe different sets of finite trees0 
In contrast, when working on a concrete domain based on the theory of ratio- 
nal trees, substitutions in rational solved form, while being finitely computable, 
no longer satisfy this property: there can be an infinite set of substitutions in 
rational solved form all describing the same set of rational trees (i.e., the same 
element in the “intended” semantics). For instance, the substitutions 

n 

<Jn = {x<-^ /(.../(a;)...)} 

for n = 1, 2, . . . , all map the variable x into the same rational tree (which is 
usually denoted by /“^). 

Ideally, a strong normal form for the set of rational trees described by a sub- 
stitution a G RSubst can be obtained by computing the limit . The problem 
is that we may end up with a°° ^ RSubst, as cr°“ can map domain variables to 
infinite rational terms. 

This poses a non-trivial problem when trying to define a “good” abstraction 
function, since it would be really desirable for this function to map any two 
equivalent concrete elements to the same abstract element. As shown in |25, the 
classical abstraction function for set-sharing analysis H32EI, which was defined 
for idempotent substitutions only, does not enjoy this property when applied, 
as it is, to arbitrary substitutions in rational solved form. A possibility is to 
look for a more general abstraction function that allows to obtain the desired 
property. For example, in 123 the sharing-group operator sg of m is replaced 
by an occurrence operator, occ, defined by means of a fixpoint computation. We 
now provide a similar fixpoint construction defining the finiteness operator. 



As usual, this is modulo the possible renaming of variables. 
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Definition 5. (Finiteness Functions.) For each n G N, the finiteness func- 
tion hvarSn : RSubst p( Vars) is defined, for each a G RSuhst, by 

hvarso(cr) Vars \ dom(cr) 

and, for n > 0, by 

hvars„(cr) hvars„_i((r) U { y G dom(cr) | vars(y(r) C hvars„_i(cr) }. 

For each cr G RSubst and each i > 0, we have hvarSi(<T) C hvarSi+i(cr) and 
also that Fors \ hvarsi(cr) C dom(cr) is a finite set. By these two properties, the 
following fixpoint computation is well defined and finitely computable. 

Definition 6. (Finiteness Operator.) For each a G RSubst, the finiteness 

def 

operator hvars: RSubst p{Vars) is given by hvars(cr) = hvars^((r) where 
I £{a) G N is such that hvars^(cr) = hvars„ (cr) for all n > £. 

The following proposition shows that the hvars operator precisely captures 
the intended property. 

Proposition 1. If a G RSubst and x G Vars then 

X G hvars(cr) <1=^ rt(a:, cr) G HTerms. 

Example 1 . Consider a G RSubst, where 

a = [xi^ f{x2),X2 ^ g{x5),X3 1-^ f{x4),X4 ^ ^(a^s)}- 

Then, 

hvarso(cr) = Vars \ {xi, X2, xs, X4}, 
hvarsi(cr) = Vars \ {xi, X3, X4}, 
hvars2(cr) = Vars \ {x3,X4\ 

= hvars(cr). 

Thus, xi G hvars(cr), although vars(a;i cr) C dom(cr). 

The abstraction function for H can then be defined in the obvious way. 

Definition 7. (The Abstraction Function for H.) The abstraction function 
an ■ RSubst ^ H is defined, for each a G RSubst, by 

Qf^(cr) VI nhvars(cr). 

The concrete domain is related to H by means of the abstraction function 
an'- ^ H such that, for each S G p{RSubst), 



aH{E) P|{ ai/(cr) I cr G r }. 
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Since the abstraction function an is additive, the concretization function is given 
by its adjoint 



lH{h) { cr G RSubst I anicr) 3 /i }. 

With these definitions, we have the desired result: equivalent substitutions 
in rational solved form have the same finiteness abstraction. 

Theorem 1. //cr, r G RSubst and TZT h V(cr r), then anicr) = aij(r). 



3.3 Abstract Unification on H X P 



The abstract unification for the combined domain H x P is defined by using the 
abstract predicates and functions as specified for P as well as a new finiteness 
predicate for the domain H . 



Definition 8. (Abstract Unification on H x P.) A term t G HTerms is a 
finite tree in h if and only if the predicate hterm/j : PlTerms Bool holds for t, 
where hterm;i(t) vars(t) C h. 

The function amgu^ : (H x P) x Bind — > PI captures the effects of a binding 
on an PI element. Let {h, p) G H x P and (x t) G Bind. Then 

amgu^((/i, p), a; 1-^ t) h' , 



where 



' h U vars(t), 
h U {x}, 
h, 



h, 



h' 



h \ share_same_varp(a:, t), 



h \ share_withp(a;), 
h \ share_withp (t) , 

\ (share_withp(a;) U share_withp(f)), 



j/htermft(x) A groundp(a;); 
j/hterm^(t) A groundp(t); 
if hterm^ (x) A hterm^ (t) 

A indp(x,t) A orJinp(x,t); 
if hterm/i (x) A hterm/j (t) 
Agfreep(x) Agfreep(t); 
if htermji (x) A hterm/j (t) 

A share Jinp(x, t) 

A or Jinp(x, t); 
«/hterm^(x) A linp(x); 
«/hterm^(t) A linp(t); 
otherwise. 



The abstract unification function amgu: {H x P) x Bind —fHxP, for any 
(h,p) G H X P and {x ^ t) G Bind, is given by 

amgu((/i,p),x 1-^ t) (amgupp((/i,p),x t),amgup(p,x t)). 
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In the computation of h' (the new finiteness component resulting from the 
abstract evaluation of a binding) there are eight cases based on properties holding 
for the concrete terms described by x and t. 

1. In the first case, the concrete term described by x is both finite and ground. 
Thus, after a successful execution of the binding, any concrete term described 
by t will be finite. Note that t could have contained variables which may be 
possibly bound to cyclic terms just before the execution of the binding. 

2. The second case is symmetric to the first one. Note that these are the only 
cases when a “positive” propagation of finiteness information is correct. In 
contrast, in all the remaining cases, the goal is to limit as much as possible the 
propagation of “negative” information, i.e., the possible cyclicity of terms. 

3. The third case exploits the classical results proved in research work on occurs- 
check reduction iHESj. Accordingly, it is required that both x and t describe 
finite terms that do not share. The use of the implicitly disjunctive predicate 
orJinp allows for the application of this case even when neither x nor t 
are known to be definitely linear. For instance, as observed in ini> this 
may happen when the component P embeds the domain Pos for groundness 
analysis 0 

4. The fourth case exploits the observation that cyclic terms cannot be cre- 
ated when unifying two finite terms that are either ground or free. Ground- 
or-freeness p] is a safe, more precise and inexpensive replacement for the 
classical freeness property when combining sharing analysis domains. 

5. The fifth case applies when unifying a linear and finite term with another 
finite term possibly sharing with it, provided they can only share linearly 
(namely, all the shared variables occur linearly in the considered terms). In 
such a context, only the shared variables can introduce cycles. 

6. In the sixth case, we drop the assumption about the finiteness of the term 
described by t. As a consequence, all variables sharing with x become possi- 
bly cyclic. However, provided x describes a finite and linear term, all finite 
variables independent from x preserve their finiteness. 

7. The seventh case is symmetric to the sixth one. 

8. The last case states that term finiteness is preserved for all variables that 
are independent from both x and t. Note that this case is only used when 
none of the other cases apply. 

The following result, together with the assumption on amgup as specified in 
Definition^ ensures that abstract unification on the combined domain H x P 
is correct. 

Theorem 2. Let {h,p) G H x P and {x ^ t) G Bind, where {a;}Uvars(t) C VI . 
Let also a G jHih) D 'Yp(p) and h' = dxngVLjj{^{h,p) ,x ^ i) . Then 

r G mgs(cr U {a; = t}) => r G 7 p(/i'). 

^ Let t be y. Let also P be Pos. Then, given the Pos formula tj) '= (x V y), both 
md^{x,y) and oxAm^{x,y) satsify the conditions in Definition 4. Note that from (f> 
we cannot infer that x is definitely linear and neither that y is definitely linear. 
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4 Ongoing and Fnrther Work 

4.1 An Instance of the Parameter Domain P 

As discussed in Sectional several abstract domains for sharing analysis can be 
used to implement the parameter component P. One could consider the well- 
known set-sharing domain of Jacobs and Langen ES]. In such a case, all the 
non-trivial correctness results have already been established in m-- in particu- 
lar, the abstraction function provided in m satisfies the requirement of Defini- 
tion 0and the abstract unification operator has been proven correct with respect 
to rational-tree unification. Note however that, since no freeness and linearity 
information is recorded in the plain set-sharing domain, some of the predicates 
of Definition 0 need to be grossly approximated. 

Therefore, a better choice would be to consider the abstract domain SFL El 
(see also 0) that represents possible sharing. This domain incorporates the 
set-sharing domain of Jacobs and Langen with definite freeness and linearity in- 
formation; the information being encoded by two sets of variables, one satisfying 
the property of freeness and the other, the property of linearity. 

Definition 9. (The Set-Sharing Domain SH.) The set SH is defined by 

SH p{SG), where SG p{VI) \ {0} is the set of sharing groups. SH is 
ordered by subset inclusion. 



Definition 10. (The Domain SFL.) Let F = p{VL) and L = p{VL) be 
partially ordered by reverse subset inclusion. The domain SFL is defined by the 

def 

Gartesian product SFL = SH x F x L ordered by ‘<$ the component-wise 
extension of the orderings defined on the sub-domains. 

Note that a complete definition, besides explicitly dealing with the set of rele- 
vant variables VL, would require the addition of a bottom element T representing 
the semantics of those program fragments that have no successful computations. 

In the next definition we introduce a few well-known operations on the set- 
sharing domain SH . These will be used to define the operations on the domain 
SFL. 

Definition 11. (Abstract Operators on SH .) For each sh G SH and each 
V C VL , the extraction of the relevant component of sh with respect to V is 
given by the function rel: p{VL) x SH SH defined as 

rel(k, sh) = {SGsh\SnV^0}. 

For each sh G SH and each V C VL , the function rel: p{VL) x SH SH 
gives the irrelevant component of sh with respect to V . Lt is defined as 
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hy 



The function (•)*: SH —>■ SH, called star-union, is given, for each sh S SH , 



sh* = { SG 



3n>l .3Ti,...,T^esh .S=\Jt, 



i=l 



For each s/ii,s/i2 € SH, the function bin: SH x SH —>■ SH , called binary 
union, is given hy 



bin(s/ii, s/12) { S'! U S'2 I •S'l G s/ii, S'2 G }■ 



It is now possible to define the implementation, on the domain SFL, of all 
the predicates and functions specified in Definition ^ 

Definition 12. (Abstract Operators on SFL.) For each d = (sh,f,l) G 
SFL, for each s,t G HTerms, where vars(s) U vars(f) C V7, let Rs = rel(vars(s), 
s/i) and Rt = rel(vars(f), sh) . Then 

mdd{s,t) (Rs n i?t = 0); 
ground^(t) (vars(t) C V7 \ vars(s/i)); 
occJind(y,t) ground^(?/) V (^occ Jin(?/, f) A (y G /) 

A Vz G vars(i) ■■ {y ^ z indd(j/, z))^ ; 

share Jin,i(s,t) Vy G vars(i?s H Rt) : 

y G vars(s) occJin(;(j/, s) 

A y G vars(t) => occJin(;(?/, t); 

freed(t) G VL . {y = t) A {y G /); 

def 

gfree^(t) = ground^ (f) V freed (t); 
lind(t) Vy G vars(t) : occAind{y,t)] 

Hpf 

orJind(s,t) = lind(s) V lind(t); 
share_same_vard(s, t) vars(i?s n Rt); 
share_withd(t) vars(i?t). 

The function amgu^: SFL x Bind SFL captures the effects of a binding 
on an element of SFL. Let d = (sh,f,l) G SFL and {x 1-^ t) G Bind, where 
Vxt = {a;} U vars(f) C VL. Let Rx = rel({a;}, sh) and Rt = rel ( vars (t), s/i) . Let 
also 

sh' rel(Kt, sh) U bin S'*), 

^ def fiia:, // freed (x) V freed (f) V (lind (f) A indd (a:, t)) / 

[Rx, otherwise; 
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St 



def 



Rt, 

Rt, 



ifireed{x) V freed (i) V (lind(a;) Aindd{x,t)); 
otherwise; 



, I f\va.Ts{Rx), 

I /\ vars(i?t), 

[ / \ vars(i?a; U i?t), 



?/ freed (a;) Afreed(t); 
?/ freed (a;); 
i/ freed (<); 
otherwise; 



r (R/\vars(s/i')) U/'UZ"; 

{ I \ (vars(i?a;) n vars(i?t)), 
I \ va,rs{Rx), 

I \ vars(i?t), 

I \ vars(i?a; U Rt), 



ifliudix) Alirid(t); 
j/lirid(a;); 
i/lind(<); 
otherwise. 



Then a,mgug(d,x 1 -^ t) {sh',f',V). 



It is worth noting that, when observing the term finiteness property, set-sharing 
is strictly more precise than pair-sharing, since a set-sharing domain is strictly 
more precise when computing the functions share_same_varp and share Jinp|f| 
This observation holds regardless of the pair-sharing variant considered, includ- 
ing ASub jOEnj, PSD and 

It remains for us to establish that the relations and functions given in Def- 
inition El satisfy all the requirements of Definitions 0 and El This will require 
a proof of the correctness, with respect to rational unification, of the abstract 
operators defined on the domain SFL, thereby generalizing and extending the 
results proved in 121 for the set-sharing domain of Jacobs and Langen. 

Note that the domain SFL is not the target of the generic specification given 
in Definition 0 more powerful sharing domains can also satisfy this schema, 
including all the enhanced combinations considered in 0. For instance, as the 
predicate gfreed defined on SFL does not fully exploit the disjunctive nature of 
its generic specification gfreCp, the precision of the analysis may be improved by 
adding a domain component explicitly tracking ground-or-freeness, as proposed 
in H . The same argument applies to the predicate or Jin^j, with respect to or Jin^, 
when considering the combination with the groundness domain Pos. 

In order to provide an experimental evaluation of the proposed finiteness 
analysis, we are implementing H x P where the P component is the SFL do- 
main extended with some of the enhancements described in 0. One of these 

® For the expert: consider the abstract evaluation of the binding x ^ y and the 
description {h, d) £ H x SFL, where h = {x, y, z} and d = {sh, f, 1} is such that 
sh = {{x, y}, {x, z}, {y, z}}, f = 0 and I = {x, y, zj. Then z ^ share_same_var,i(a;, y) 
so that we have h' = {z}. In contrast, when using a pair-sharing domain such as 
PSD, the element d is equivalent to d' = (sh',f,l), where sh' = shU \^{x,y,z}). 
Hence we have z G sharejame-var^j/ (a;, y) and h' = 0. Thus, in sh the information 
provided by the lack of the sharing group {x, y, zj is redundant when observing pair- 
sharing and groundness, but it is not redundant when observing term finiteness. 



Finite- Tree Analysis for Constraint Logic-Based Languages 181 



enhancements uses information about the actual structure of terms. It has been 
shown in | 2 | that this structural information, provided by integrating the generic 
Pattern(-) construction with SFL, can have a key role in improving the precision 
of sharing analysis and, in particular, allowing better identification where cyclic 
structures may appear. Thus, it is expected that structural information captured 
using Pattern(iL x P) can improve the precision of finite-tree analysis; both with 
respect to the parametric component P and the finiteness component H itself. 

4.2 Term-Finiteness Dependencies 

The parametric domain H x P captures the negative aspect of term-finiteness, 
that is, the circumstances under which finiteness can be lost. When a binding 
has the potential for creating one or more rational terms, the operator amguj:^ 
removes from h all the variables that may be bound to non-finite terms. However, 
term-finiteness has also a positive aspect: there are cases where a variable is 
guaranteed to be bound to a finite term and this knowledge can be propagated 
to other variables. Guarantees of finiteness are provided by several built-ins like 
unify_with_occurs_check/ 2 , var/ 1 , name/ 2 , all the arithmetic predicates, and 
so forth. SICStus Prolog also provides an explicit acyclic_term/l predicate. 

The term-finiteness information provided by the h component of H x P does 
not capture the information concerning how finiteness of one variable affects the 
finiteness of other variables. This kind of information, usually termed relational 
information, is very important as it allows the propagation of positive finiteness 
information. An important source of relational information comes from depen- 
dencies. Consider the terms ti f{x), ^2 g{y)^ Etnd t^ h{x,y): it is clear 

that, for each assignment of rational terms to x and y, t^ is finite if and only if 
ti and t2 are so. We can capture this by the Boolean formula ts ^ {ti At2). The 
reasoning is based on the following facts: 

1 . ti,t2, and ts are finite terms, so that the finiteness of their instances depends 
only on the finiteness of the terms that take the place of x and y. 

2 . ts covers both t\ and t2, that is, vars(t3) A vars(ti) Uvars(t2); this means 
that, if an assignment to the variables of ts produces a finite instance of ts, 
that very same assignment will necessarily result in finite instances of ti and 
t2- Conversely, an assignment producing non-finite instances of t\ or t2 will 
forcibly result in a non-finite instance of ta. 

3 . Similarly, t\ and t2, taken together, cover t^. 

The important point to notice is that the indicated dependency will continue to 
hold for any further simultaneous instantiation of t\, t2, and t^. In other words, 
such dependencies are preserved by forward computations (since they proceed 
by consistently instantiating program variables). 

Consider the abstract binding x ^ t where t is a finite term such that 
vars(t) = {y\, . . . ,yn}- After this binding has been successfully performed, the 
destinies of x and t concerning term-finiteness are tied together forever. This tie 
can be described by the dependency formula 



X (?/i A • • • A yn), 



( 1 ) 
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meaning that x will be bound to a finite term if and only if, for each i = 1, 
n, Ui is bound to a finite term. While the dependency expressed by 0 is 
a correct description of any computation state following the application of the 
binding x t, it is not as precise as it could be. Suppose that x and yk are 
indeed the same variable. Then (P) is logically equivalent to 

X ^ {yi A ■ ■ ■ A yk-i A yk+i A • • • A y„). (2) 

Correct: whenever x is bound to a finite term, all the other variables will be 
bound to finite terms. The point is that x has just been bound to a non-finite 
term, irrevocably: no forward computation can change this. Thus, the implication 
(0 holds vacuously. The precise and correct description for the state of affairs 
caused by the cyclic binding is, instead, the negated atom ~^x, whose intuitive 
reading is “x is not (and never will be) finite.” 

Following the intuition outlined above, in P we have studied a domain, 
whose carrier is the set of all Boolean functions, for representing and propagating 
finiteness dependencies. We believe that coupling this new domain with H x P 
can greatly improve the precision of the analysis. 

5 Conclusion 

Several modern logic-based languages offer a computation domain based on ra- 
tional trees. On the one hand, the use of such trees is encouraged by the possi- 
bility of using efficient and correct unification algorithms and by an increase in 
expressivity. On the other hand, these gains are countered by the extra problems 
rational trees bring with themselves and that can be summarized as follows: sev- 
eral built-ins, library predicates, program analysis and manipulation techniques 
are only well-defined for program fragments working with finite trees. 

In this paper we propose an abstract-interpretation based solution to the 
problem of detecting program variables that can only be bound to finite terms. 
The rationale behind this is that applications exploiting rational trees tend to 
do so in a very controlled way. If the analysis we propose proves to be precise 
enough, then we will have a practical way of taking advantage of rational trees 
while minimizing the impact of their disadvantages. 



References 

1. R. Bagnara, R. Gori, P. M. Hill, and E. Zaffanella. Finite-tree analysis for con- 
straint logic-based languages. Quaderno 251, Dipartimento di Matematica, Uni- 
versita di Parma, 2001. Available at http://www.cs.unipr.it/~bagnara/ 

2. R. Bagnara, P. M. Hill, and E. Zaffanella. Efficient structural information analysis 
for real CLP languages. In M. Parigot and A. Voronkov, editors. Proceedings of the 
7th International Conference on Logic for Programming and Automated Reasoning 
(LPAR 2000), volume 1955 of Lecture Notes in Computer Science, pages 189-206, 
Reunion Island, France, 2000. Springer- Verlag, Berlin. 



Finite- Tree Analysis for Constraint Logic-Based Languages 183 



3. R. Bagnara, P. M. Hill, and E. Zaffanella. Set-sharing is redundant for pair-sharing. 
Theoretical Computer Science, 2001. To appear. 

4. R. Bagnara, E. Zaffanella, R. Gori, and P. M. Hill. Boolean functions for finite-tree 
dependencies. Quaderno 252, Dipartimento di Matematica, Universita di Parma, 
2001. Available at http://www.cs.unipr.it/~bagnara/ 

5. R. Bagnara, E. Zaffanella, and P. M. Hill. Enhanced sharing analysis techniques: A 
comprehensive evaluation. In M. Gabbrielli and F. Pfenning, editors. Proceedings 
of the 2nd International ACM SIGPLAN Conference on Principles and Practice 
of Declarative Programming, pages 103-114, Montreal, Ganada, 2000. Association 
for Computing Machinery. 

6. M. Bruynooghe, M. Codish, and A. Mulkers. A composite domain for freeness, 
sharing, and compoundness analysis of logic programs. Technical Report CW 196, 
Department of Computer Science, K.U. Leuven, Belgium, July 1994. 

7. J. A. Campbell, editor. Implementations of Prolog. Ellis Horwood/Halsted 
Press/Wiley, 1984. 

8. B. Carpenter. The Logic of Typed Feature Structures with Applications to 
Unification-based Grammars, Logic Programming and Constraint Resolution, vol- 
ume 32 of Cambridge Tracts in Theoretical Computer Science. Cambridge Univer- 
sity Press, New York, 1992. 

9. M. Codish, D. Dams, and E. Yardeni. Derivation and safety of an abstract uni- 
fication algorithm for groundness and aliasing analysis. In K. Furukawa, editor. 
Logic Programming: Proceedings of the Eighth International Conference on Logic 
Programming, MIT Press Series in Logic Programming, pages 79-93, Paris, France, 
1991. The MIT Press. 

10. A. Colmerauer. Prolog and infinite trees. In K. L. Clark and S. A. Tarnlund, 
editors. Logic Programming, APIC Studies in Data Processing, volume 16, pages 
231-251. Academic Press, New York, 1982. 

11. A. Colmerauer. Equations and inequations on finite and infinite trees. In Pro- 
ceedings of the International Conference on Fifth Generation Computer Systems 
(FGCS’84), pages 85-99, Tokyo, Japan, 1984. ICOT. 

12. A. Colmerauer. An introduction to Prolog-Ill. Communications of the ACM, 
33(7):69-90, 1990. 

13. A. Cortesi and G. File. Sharing is optimal. Journal of Logic Programming, 
38(3):371-386, 1999. 

14. A. Cortesi, B. Le Charlier, and P. Van Hentenryck. Combinations of abstract 
domains for logic programming: Open product and generic pattern construction. 
Science of Computer Programming, 38(1-3), 2000. 

15. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static 
analysis of programs by construction or approximation of fixpoints. In Proceedings 
of the Fourth Annual ACM Symposium on Principles of Programming Languages, 
pages 238-252, 1977. 

16. P. Cousot and R. Cousot. Abstract interpretation frameworks. Journal of Logic 
and Computation, 2(4) :51 1-547, 1992. 

17. L. Crnogorac, A. D. Kelly, and H. Spndergaard. A comparison of three occur-check 
analysers. In R. Cousot and D. A. Schmidt, editors, Static Analysis: Proceedings 
of the 3rd International Symposium, volume 1145 of Lecture Notes in Computer 
Science, pages 159-173, Aachen, Germany, 1996. Springer- Verlag, Berlin. 

18. P. R. Eggert and K. P. Chow. Logic programming, graphics and infinite terms. 
Technical Report UCSB DoCS TR 83-02, Department of Computer Science, Uni- 
versity of California at Santa Barbara, 1983. 



184 Roberto Bagnara et al. 



19. G. Erbach. ProFIT: Prolog with Features, Inheritance and Templates. In Pro- 
ceedings of the 7th Conference of the European Chapter of the Association for 
Computational Linguistics, pages 180-187, Dublin, Ireland, 1995. 

20. M. Filgueiras. A Prolog interpreter working with infinite terms. In Campbell 0, 
pages 250-258. 

21. F. Giannesini and J. Cohen. Parser generation and grammar manipulation using 
Prolog’s infinite trees. Journal of Logic Programming, 3:253-265, 1984. 

22. W. Hans and S. Winkler. Aliasing and groundness analysis of logic programs 
through abstract interpretation and its safety. Technical Report 92-27, Technical 
University of Aachen (RWTH Aachen), 1992. 

23. S. Haridi and D. Sahlin. Efficient implementation of unification of cyclic structures. 
In Campbell [Zj, pages 234-249. 

24. P. M. Hill, R. Bagnara, and E. Zaffanella. Soundness, idempotence and commuta- 
tivity of set-sharing. Theory and Practice of Logic Programming, 2001. To appear. 
Available at http://arXiv.org/abs/cs.PL/0102030. 

25. B. Intrigila and M. Venturini Zilli. A remark on infinite matching vs infinite 
unification. Journal of Symbolic Computation, 21(3):2289-2292, 1996. 

26. D. Jacobs and A. Langen. Accurate and efficient approximation of variable aliasing 
in logic programs. In E. L. Lusk and R. A. Overbeek, editors. Logic Programming: 
Proceedings of the North American Conference, MIT Press Series in Logic Pro- 
gramming, pages 154-165, Cleveland, Ohio, USA, 1989. The MIT Press. 

27. J. Jaffar, J-L. Lassez, and M. J. Maher. Prolog-H as an instance of the logic 
programming scheme. In M. Wirsing, editor. Formal Descriptions of Programming 
Concepts III, pages 275-299. North-Holland, 1987. 

28. T. Keisu. Tree Constraints. PhD thesis. The Royal Institute of Technology, Stock- 
holm, Sweden, May 1994. 

29. A. King. Pair-sharing over rational trees. Journal of Logic Programming, 46(1- 
2):139-155, 2000. 

30. M. J. Maher. Complete axiomatizations of the algebras of finite, rational and 
infinite trees. In Proceedings, Third Annual Symposium on Logic in Computer 
Science, pages 348-357, Edinburgh, Scotland, 1988. IEEE Computer Society. 

31. K. Mukai. Constraint Logic Programming and the Unification of Information. PhD 
thesis. Department of Computer Science, Faculty of Engineering, Tokio Institute 
of Technology, 1991. 

32. C. Pollard and LA. Sag. Head-Driven Phrase Structure Crammar. University of 
Chicago Press, Chicago, 1994. 

33. F. Scozzari. Abstract domains for sharing analysis by optimal semantics. In J. Pals- 
berg, editor, Static Analysis: 7th International Symposium, SAS 2000, volume 1824 
of Lecture Notes in Computer Science, pages 397-412, Santa Barbara, CA, USA, 
2000. Springer- Verlag, Berlin. 

34. Cert Smolka and Ralf Treinen. Records for logic programming. Journal of Logic 
Programming, 18(3):229-258, 1994. 

35. H. Spndergaard. An application of abstract interpretation of logic programs: Occur 
check reduction. In B. Robinet and R. Wilhelm, editors. Proceedings of the 1986 
European Symposium on Programming, volume 213 of Lecture Notes in Computer 
Science, pages 327-338. Springer- Verlag, Berlin, 1986. 

36. Swedish Institute of Computer Science, Programming Systems Group. SICStus 
Prolog User’s Manual, release 3 ffO edition, 1995. 



Applications of Extended Static Checking 



K. Rustan M. Leino 

Compaq Systems Research Center 
130 Lytton Ave., Palo Alto, CA 94301, USA 
rustan. leinoOcompaq. com 



Abstract. Extended static checking is a powerful program analysis tech- 
nique. It translates into a logical formula the hypothesis that a given 
program has some particular desirable properties. The logical formula, 
called a verification condition, is then checked with an automatic theo- 
rem prover. The extended static checking technique has been built into a 
couple of program checkers. This paper discusses other possible applica- 
tions of the technique to the problem of producing quality software more 
quickly. 



1 Extended Static Checking 

The use of software plays a large role in our lives, directly and indirectly. We 
wish it were cheaper to create software, easier to produce software that is correct, 
and quicker to get to market with new or updated software. A research goal is 
therefore to improve program quality and programmer productivity. 

Many attempts have been made to increase productivity in software develop- 
ment. Perhaps the most successful attempts have been ones that have influenced 
the design of programming languages. A language can force programs to follow 
certain programming disciplines. For example, the discipline of storing in each 
variable values of only one type has been enormously successful. It has led to 
the development of type systems, type declarations, and type checkers. In fact, 
type checking has become such a standard occurrence in popular languages that 
we tend to think of type errors as simple typographic errors rather than think- 
ing about the great damage that such errors might have caused at run time. A 
type checker is usually applied at the time a program is compiled, which means 
the errors it catches are found early in the development cycle, which drastically 
decreases the cost of the errors. 

Another example of a programming discipline that has influenced the vast 
majority of programs being written today is the abstinence from use of arbitrary 
control-flow jumps. Useful jump disciplines have been incorporated into for ex- 
ample if statements, loops, exceptions, and dynamically dispatched method calls, 
whereas arbitrary goto statements have ceased to be supported. Because mod- 
ern high-level languages do not provide constructs for doing arbitrary jumps, 
the programming errors that stem from arbitrary jumps can be avoided alto- 
gether. In this way, good programming disciplines have influenced the design of 
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programming languages, which in turn have a strong influence on the kinds of 
programs programmers write and the kinds of errors they are likely to introduce. 

Another attack on programming errors involves forms of program analysis 
beyond traditional type checking. The program analysis technique that I focus 
on in this paper is extended static checking (ESC) . Drawing from program veri- 
fication technology, ESC translates a given program (or part of a program) into 
a logical formula called a verification condition. The idea is that the verification 
condition holds if and only if the program satisfies certain desirable properties. 
The verification condition is then passed to a theorem prover that searches for 
counterexamples to the verification condition. Each counterexample corresponds 
to an error condition in the program. 

Two program checkers that use this ESC technique are the extended static 
checkers for Modula-3 (ESC/Modula-3) 0 and for Java (ESC/Java) jiUllt)) . 

Although extended static checking shares its building blocks with program 
verification, there’s a fundamental difference between the two techniques: the 
goal of program verification is to prove the full functional correctness of the given 
program, whereas the goal of ESC is merely to And certain common errors, like 
null dereferences and array index bounds errors. This difference has far-reaching 
consequences. 

One consequence is that the task of the theorem prover can be expected to be 
simpler than in program verification. Indeed, in ESC/Modula-3 and ESC/Java, 
this simplicity has allowed the underlying theorem prover to be entirely auto- 
matic. This means that users of ESC-based tools do not need the expertise to 
drive an interactive mechanical theorem prover; in fact, users don’t even need 
to know about the underlying theorem prover. 

Another important consequence of the difference between the goals of ESC 
and program verification is that the user-supplied specifications can be expected 
to be simpler. Indeed, in ESC/Java, for example, these light-weight specifica- 
tions, called annotations, revolve around programming concepts and ordinary 
Java boolean expressions; in contrast, proof strategies and user-defined theories, 
which are unfamiliar to most programmers, are neither supported nor needed. 

A third important consequence of ESC’s non-goal of proving the full correct- 
ness of the program is that the ESC technique can allow a verification condition 
generator to be designed to make use of assumptions that are not checked. For 
example, it may be convenient to assume that no arithmetic overflows occur in 
the program, or that the object being initialized by a constructor is not dan- 
gerously “leaked” to an accessible location before all subclass constructors have 
completed their initializing actions. If assumptions are not checked, the resulting 
checking is unsound — if the assumptions are violated by the program, then ESC 
may have missed some of the errors that it was looking for (c/. |l ,8)1. On the 
upside, the unsoundness can be chosen by the tool designer to further simplify 
the task of the theorem prover and to further simplify the annotation language. 
It is prudent to assume only those conditions that are less likely to be violated 
in practice. 
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The level of automation in ESC is reminiscent of type checking. In fact, 
the name extended static checking comes from the idea that the technique is 
applied in ways similar to how static type checking is applied, but yielding an 
extended set of checks. There are more similarities: both static type checking and 
extended static checking can perform modular checking, that is, the checking 
can proceed module by module, without needing global program information. 
Modular checking is possible because type checking and ESC rely on annotations 
that specify module boundaries. For example, a type checker can deal with a call 
to a procedure P by knowing only the type signature of P (note in particular 
that the implementation of P is not needed to reason about the call), and the 
type signature is supplied by the programmer as a type declaration. 

ESC annotations are a great feature. Let me list some of their benefits. 

One benefit is that annotations allow programmers to write down their design 
decisions, for example declaring the intention that a certain pointer variable is 
always non-null or that a certain parameter is expected to be a valid index into 
a particular array. Not only does writing down the annotations document the 
design decisions, but it allows ESC to enforce them. 

Another benefit is that the modular checking enabled by the annotations can 
make tools based on ESC scalable. The smaller bodies of code that are analyzed 
as units permit ESC-based checkers to perform a more detailed analysis. In 
contrast, whole-program analyses are forced to abstract from the actual code or 
approximate the behavior of the code in order to scale to large programs. For 
example, the tool PREfix approximates program behavior by considering only 
a fixed number of execution paths through each function Q, SLAM abstracts 
the program behavior by reducing the program’s state space to a finite set of 
predicates Q, and abstract interpretation is a technique for automatically finding 
an abstraction of the program p| . While abstraction can be good and is in many 
situations sufficient to do useful checking, modular checking affords a tool the 
luxury of using more details. 

While annotations have great benefits, the experience with ESC/ Java has 
shown that not everyone embraces the use of annotations. Far from it. The 
dominant cost in using ESC is the task of supplying annotations. The jury is still 
out on whether the benefits of ESC at the current required level of annotation 
outweigh its cost. But regardless of whether the ESC tools built so far are cost 
effective, one can apply the ESC technique differently from how it is applied in 
the checkers ESC/Modula-3 and ESC/ Java. Different applications of ESC can 
adjust the required level of annotation differently and can maybe even change 
how annotations are perceived. In the rest of this paper, I discuss four such 
possible applications. 

2 New Applications of the ESC Technique 

2.1 Changing the Unit of Modularity 

The checkers ESC/Modula-3 and ESC/ Java perform modular checking at the 
level of separate routines. That is, these checkers analyze the implementation of 
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one routine using only the specifications, not the implementations, of routines 
that are called. This approach follows the ideas behind procedural abstraction 
in that the specifications explicitly establish a contract between callers and the 
implementation of each routine. Because programmers tend to divide their code 
into reasonably-sized routines to limit the complexity of each routine, the ap- 
proach of using routines as the unit of modularity naturally imposes comfortable 
limits on the amount of code that is analyzed at a time. 

However, sometimes this level of granularity may not be ideally suited. For 
example, in object-oriented languages like Java, the program checker can often 
use the code of a private method in lieu of (or in addition to) a specification of 
the method. This suggests that one could use the class, rather than the routine, 
as the unit of modularity. By enlarging the unit of modularity in this way, this 
approach would reduce the annotation burden. In Java, one can even consider 
going a step further, making the package (which is a set of classes and interfaces) 
be the unit of modularity. 

While enlarging the unit of modularity has the benefit that less annotation 
is needed, the cost is that verification conditions grow bigger, and the number 
of execution paths to consider through the code can grow dramatically. 

So, new applications of ESC can reduce the annotation burden by adopting 
a different unit of modularity. This is similar to the approach taken by the type 
system in ML, where type inference reduces the need for manual type annotations 
on every function 

2.2 Using ESC as a Subroutine 

Another promising application of the ESC technique is as a precise subroutine for 
analyzing a small piece of code in the context of a more encompassing checker. 
Let me mention two current efforts in this application area. 

Houdini is a program checker that first infers annotations for a given (unan- 
notated) program and then invokes ESC/ Java on the annotated program to 
find possible errors im. The inferred annotations reduce the number of spurious 
warnings produced by the call to ESC/ Java. Houdini also makes use of ESC/ Java 
in its inference algorithm: it iteratively calls ESC/ Java as a subroutine in order 
to discover annotations that are consistent with the program. 

Another ongoing research effort is to use the tool Daikon’s run-time profiling 
and statistical invariant inference ^ to create annotations that are then checked 
by ESC/Java [TT?) . By using ESC as a subroutine, the inferred likely invariants 
can be checked for all program inputs, not just the inputs used during the run- 
time profiling. 



2.3 Auuotatiou Checkers 

Another application of the ESC technique can be found by applying a tool like 
ESC/Java in a particular, limited way. By default, ESC/Java warns about the 
possible misapplication of language primitives (like null dereferences and array 
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bounds errors) whenever it cannot establish that such errors won’t occur. For 
an unannotated program, this leads to a large number of warnings, which is not 
useful to programmers. 

An alternative is to permanently turn these warnings off (which ESC/ Java’s 
command-line switches permit), letting the tool check only that the program is 
consistent with the given annotations. The resulting tool, an annotation checker, 
will then produce no warnings on an unannotated program. However, as soon 
as an annotation is added, the checker goes to work. This mode of checking, 
although it won’t find checked run-time errors, lets programming teams ease 
their way into using the ESC checker. Even though annotations are required 
in order for the tool to do any actual checking, the annotation burden that 
otherwise arises from checking the use of language primitives is avoided. Stated 
differently, users get only what they pay for. 

It is worth pointing out a particular specification paradigm that seems ideally 
suited for an annotation checker. The paradigm specifies a protocol that clients of 
a given class must satisfy. For example, consider a class representing character 
input streams and containing the operations open, getChar, unGetChar, and 
close. Suppose the intended use of these operations is as follows: open must 
be applied before any other operations are applied, no operations are allowed 
after applying close, and unGetChar is allowed only if the previous operation 
applied to the stream was getChar. To specify this protocol, we can introduce two 
boolean object fields isOpen and isUnGettable (ESC/Java allows the declaration 
of ghost fields for when the fields don’t need to be seen by the compiler) and 
specify the four methods as follows: 

/*@ modifies isOpen, isUnGettable; ensures isOpen */ 
void open{String filename); 

/*@ requires isOpen; modifies isUnGettable; ensures isUnGettable */ 
int getGhar{); 

/*@ requires isOpen Sz isUnGettable; modiRes isUnGettable; */ 
void unGetGhar{); 

/*@ requires isOpen; modiRes isOpen, isUnGettable; */ 
void close {); 

where requires specifies a method’s precondition, modifies specifies which vari- 
ables the method may modify, and ensures specifies the method’s postcondition. 

In summary, simple protocols like the one above can easily be specified in 
annotations. An annotation checker then checks that any specified protocol is 
observed by clients. While one can write protocol specifications like this one using 
special languages for abstract state machines (see e.g. AsmL 0 and Vault 0), 
the ESC technique can perform a detailed semantical analysis and allows such 
protocol specifications to co-exist with other kinds of specifications. 
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2.4 Programming Language Design 

As history shows, programming language design, for those languages that at- 
tract users, is one of the most effective ways to influence software development 
practices. The application of the ESC technique in programming language design 
seems intriguing, notably in cases where the analysis to be performed is mostly 
local to a routine or when the annotation overhead is small. I will outline three 
specific example applications, and will then discuss some obstacles that must be 
overcome to make these applications a reality. 

A primary goal of programming language design is to find grammatical re- 
strictions, static checks, and an execution semantics that together guarantee 
certain invariants at run time. For example, static and dynamic type checks can 
guarantee the invariant that variables hold values of their declared types. One 
can take the type systems of today’s popular languages a step further by dis- 
tinguishing between non-null types and may-be-null types. For example, let the 
type T— represent the set of objects of a class T (or of a subclass of T), and let 
the type T+ indicate the union of that set and the special object null. A dy- 
namic type cast of an expression e from T-|- to T— gives rise to a run-time check 
that e does not evaluate to the value null. The languages Java, Modula-3, and 
C-l— I- provide only type T-I-, whereas the language CLU (see e.g. HH]) provides 
the type T— and allows type T-l- to be defined using a tagged union. Perhaps 
a reason for why this null-value discrimination in types has not yet found its 
way into more languages is that it is awkward (in the language definition and in 
programs) to have to have a special construct for breaking up the tagged union 
into two cases. More natural may be simply to use an ordinary if statement and 
compare the value of a T-l- expression with null before using the expression as 
a T— expression. The ESC technique may see a useful application here, because 
it could do a precise analysis of each routine implementation, producing an er- 
ror for any possible dereference of null or possible assignment of null to a T— 
variable. That is, the language would sport both types T— and T-I-, but would 
require programmers to insert a dynamic type cast from T-|- to T— only when 
ESC would otherwise have produced an error. 

Let’s consider a second example. Uninitialized local variables is a problem in 
some languages, like C, and lint-like checkers are wise to warn about situations 
when this is a possible error. To prevent such errors, Java builds in a “definite 
assignment” rule that guarantees that every local variable is initialized (to a 
value of the variable’s type) before it is used. The rule usually does well, but not 
for a code segment like 



int X] 

if {B) { . . . assign to a; . . . } 
if (C) { . . . use a; . . . } 

where condition C evaluates to true only if the first if statement’s then clause is 
executed. The coarseness of Java’s definite assignment rule will cause the com- 
piler to report an error for this code segment (likewise, any coarse lint-like checker 
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will produce a spurious warning). In these cases, one might have wished for a dy- 
namic check instead, but programmers are forced to add a “dummy” assignment 
to X before the first if statement. If the programmer’s assumption about condi- 
tion C is mistaken, then the dummy assignment will have introduced an error 
that is harder to catch. Here, the more detailed analysis of the ESC technique 
can come to the rescue (see e.g. ESC/Java’s uninitialized annotation llbn . 

As a third example, we may consider including language support for speci- 
fying and checking protocols like those referred to in the previous subsection. 

To realize any of these improvements in language design through applications 
of the ESC technique, one must overcome some obstacles. I’ll describe the three 
primary obstacles. 

A first obstacle is that ESC checking can be unsound. For a language to 
guarantee certain program invariants, the ESC technique had better be applied 
only in sound ways. One may be able to achieve soundness, but the challenge is 
to do so while retaining the power of the ESC technique and keeping annotation 
to a bare minimum. 

Another obstacle is the possible need to know which variables a routine can 
modify. For instance, to enforce a protocol like the one in the example in Sec- 
tion lO one needs to know whether or not a call can modify isOpen. Specifying 
the possible modifications of a routine is difficult, because it adds complexity to 
the annotation language mm- In fact, to ensure soundness, one may need 
to introduce a discipline for alias confinement Q, which may in turn require 
knowing which variables a routine can read 0. 

A third obstacle is that to include ESC checks among the static checks pre- 
scribed by a programming language, the language definition must include enough 
of a description of the verification condition generator and theorem prover that 
all compilers will agree on which programs are to be accepted. For example, if the 
language definition were to leave the strength of the theorem prover unspecified, 
then a program accepted by a compiler with a powerful theorem prover might 
not be accepted by a compiler with a more impoverished theorem prover. There 
is hope, however: a language may define the generation of verification conditions 
into some decidable mathematical domain. For example, if the verification con- 
ditions are formulas in first-order logic with uninterpreted function symbols and 
equality, then a compiler can use any sound and complete decision procedure 
for congruence closure — the language definition would cleanly say what needs 
to be checked, not how it is to be checked. Restricting verification conditions to 
decidable domains limits the power of the ESC technique (c/. P3|), but in the 
case of language design, such limitations may be appropriate. 

3 Conclusion 

Extended static checking is a powerful program analysis technique. To date, the 
primary application of the technique has been to standalone program checkers 
that search for misapplications of language primitives, perform modular check- 
ing routine by routine, and require annotations. This paper has sketched some 
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new application areas of the ESC technique. One of these areas for which the 
potential gain is particularly high is programming language design, but sev- 
eral research challenges remain to make this application a reality. By exploring 
various applications of the ESC technique, possibly in combination with other 
techniques, we can hope to reduce the cost in producing quality software. 
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Abstract. All practical C programs use structures, arrays, and/or 
strings. At runtime, such objects are mapped into consecutive mem- 
ory locations, hereafter referred to as buffers. Many software defects are 
caused by buffer overflow — unintentional access to memory outside the 
intended object. String manipulation is a major source of such defects. 
According to the FUZZ study, they are the cause of most UNIX failures. 
We present a new algorithm for statically detecting buffer overflow de- 
fects caused by string manipulations in C programs. In many programs, 
our algorithm is capable of precisely handling destructive memory up- 
dates, even in the presence of overlapping pointer variables which refer- 
ence the same buffer at different offsets. Thus, our algorithm can uncover 
defects which go undetected by previous works. We reduce the problem 
of checking string manipulation to that of analyzing integer variables. 

A prototype of the algorithm has been implemented and applied to stat- 
ically uncover defects in real C applications, i.e., errors which occur on 
some inputs to the program. The applications were selected without a pri- 
ori knowledge of the number of string manipulation errors. A significant 
number of string manipulation errors were found in every application, 
further indicating the extensiveness of such errors. We are encouraged 
by the fact that our algorithm reports very few false alarms, i.e., warn- 
ings on errors that never occur at runtime. 



1 Introduction 

Strings are frequently used in C programs, despite being one of the most unsafe 
type. Many software defects result from misusing strings, pointers to strings and 
standard C string functions (e.g., getsO, strcpyO). We refer to such bugs 
as string-manipulation cleanness violations, since they should not exist in any 
“reasonable” program, independent of the program specification. In general, we 
say that an expression is unelean if there exists an input to the program on which 
the result of the expression is undefined according to ANSI C semantics m- 
An expression is clean when no such input exists. A program is clean when 
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it contains only clean expressions. Thus, the behavior of a clean C program is 
always predictable and does not depend on the environment. 

String-manipulation cleanness violations lead to many nasty bugs. For ex- 
ample, we have found that 60% of the UNIX failures reported in the 1995 FUZZ 
study m are due to string-manipulation cleanness violations. In Nov. 1988, the 
Internet worm incident used a buffer overflow in f ingerd to attack 60,000 com- 
puters m- Furthermore, CERT advisories indicate that buffer overflows account 
for up to 50% of today’s software vulnerabilities m- 

1.1 Main Results 

In this paper, we describe an algorithm that discovers potentially unclean string- 
manipulation expressions. Figure H shows three interesting erroneous C func- 
tions analyzed by our algorithm. Unclean expressions are underlined. Figure ^a) 
shows an example of overlapping. After statement I 2 , p overlaps s by 5, i.e., p 
points to the sixth character in the string pointed to by s. This implies that after 
statement I 3 , which destructively updates p, s points to a 12-character string 
and that the expression at I 4 is unclean — it copies 13 bytes (including null) 
into a 10-byte array. Our static analysis algorithm detects this. 



void simple () { 
char s[20],t[10],*p; 

b :strcpy(s,” Hello” ) ; 
h-P = s-l-5; 

b:strcpy(p, ” world!”); 
b:strcpy(t,s); 

} 

(a) 

/* web2c [strpascal.c] */ 
void nulLterminate (char *s) { 
b: while (*s != ’ ’) 

l2- 

h- !s = 0; 

} 

(b) 



/* web2c [fixwrites.c] */ 

#define BUFSIZ 1024 
char buf[BUFSIZ]; 

char* insert Jong (char *cp) { 
char temp [BUFSIZ]; 
int i; 

b: for (i=0; &buf[i] < cp; -|--|-i) 
b: temp[i] = buf[i[; 

b:strcpy (&temp[i],”(long)”); 
b:strcpy (&temp[i -|- 6], cp); 
b:strcpy (buf, temp); 
b:return cp -I- 6; 

} 

(c) 



Fig. 1. Unclean string manipulation functions. Unclean expressions are under- 
lined. The function simple is from m and the other two functions are from 
web2c 6.1. 

A typical error of accessing a string without bound-checking is shown in 
Figure mb). If the string pointed to by s does not contain a blank, s will be 
incremented beyond the allocated bounds. Our algorithm analyzes pointer arith- 
metic and reports a potentially unclean expression at b- 
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It is quite challenging to find unclean expressions in the function insert _long 
shown in Figure ^c) without reporting false alarms. As shown in Section 0 our 
algorithm achieves that. This function inserts the string " (long) " at offset cp 
by copying the data up to cp into a temporary array, appending the string 
" (long) " and appending the remainder of the string. When invoked, cp over- 
laps buf . A buffer overflow occurs at I 3 and/or Z 4 if not enough space is left in 
temp. Therefore, these expressions are flagged as potentially unclean. In addi- 
tion, our algorithm verifies that all the other expressions buf [i] , temp [i] , and 
strcpy(buf , temp) are clean. Knowing that buf and cp overlap and analyzing 
the condition at statement li is necessary in order to validate that the access to 
buf [i] is within bounds, and so is temp[i] (since i is less than BUFSIZ). 

Technically, we propose to reduce the problem of checking string-manipulation 
cleanness violations to that of analyzing integer variables — a problem with 
many known solutions. Although the problem of string-manipulation cleanness 
checking is to verify that accesses are within bounds, it is more complicated 
than array bounds checking. The domain of string programs requires that the 
analysis is capable of tracking the following features of the C programming lan- 
guage: (i) handling standard C functions such as strcpyO and strlenO which 
perform unbounded number of loop iterations in a rather precise way; (ii) stati- 
cally estimating the length of strings (in addition to the size of allocated arrays) . 
This length is dynamically changed based on the index of the first null charac- 
ter; and (iii) simultaneously analyzing pointer and integer values is required in 
order to precisely handle pointer arithmetic and destructive updates. Our anal- 
ysis is the first with these capabilities. The overlapping information enables our 
method to handle destructive updates in a rather precise way. When updating a 
pointer value such as *p=0, the overlapping information between p and another 
pointer, say q, is used to compute the effect of the update on the length of q. In 
particular, this paper reports three main results: 

1 . A source-to-source transformation that produces an instrumented C program 
which asserts when a string-manipulation cleanness violation occurs. The 
transformation is described in Section |2l 

2. In Section 0 we show that the integer analysis algorithm of Cousot and 
Halbwachs |H] can be used to analyze the instrumented program to find all 
potentially unclean string expressions in a rather precise way. Methods with 
lower complexity such as range analysis [3j can also be used. 

3. We have implemented both the source-to-source transformation and the 
static analysis algorithms and applied them to real C programs (Section . 
The experimental results are quite encouraging. We analyze real programs 
and locate nontrivial software defects with only a few false alarms. 

In the following subsections we elaborate on these results. 



A Source-to- Source C Transformation. We describe a source-to-source pro- 
gram transformation that produces an instrumented C program which dynam- 
ically checks that expressions are clean using assert statements. For example. 
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the instrumented version of Figure Ka) shown in Figure |3 will abort at cond^ 
indicating that the expression strcpyCt, s) at of the original program is 
unclean. The details of the transformation technique are explained in Sectional 



void simple () { 

char s[20]; int s_len=0, s_alloc=20; 
char *p; int p_len=0, p_alloc=0; 
char t[10]; int t_len=0, t_alloc=10; 
int p_overlap_s = 0; 



condi:assert(s_alloc > 5); 
h:strcpy(s,” Hello”); 

postij:s_len = 5; 



cond2:assert(s_alloc >= 5 ); 

h-P = s+5; 

post2i :p-Overlaps_s = 5 ; 
post22:P-lon = s_len - 5 ; 
post23:p-alloc = s_alloc - 5 ; 
conds :assert(p_alloc > 7 ); 
l 3 :strcpy(p, ” world!”); 

postsi :p_len = 7; 

postsj :s_len = p_len + p_overlaps_s; 



} 



cond 4 :assert(t_alloc > s_len); 
l 4 :strcpy(t,s); 

post 4 j:t_len = s_len; 



Fig. 2. The instrumented function simple from Figure ^a). Added instrumen- 
tation code is indented and appears in boldface. 



If the original program is clean, then the instrumented program yields the 
same results. However, if the original program contains an unclean string expres- 
sion, then the instrumented program will abort on an assert statement, while 
the result of the original program is undefined. 

The idea of instrumenting programs to dynamically check cleanness is not 
new. For example, SafeC Q and m also instruments the C source with asser- 
tions. Purify m instruments the executable files. These tools instrument every 
physical buffer by extending it with additional information and checking this 
additional information at runtime. Our approach is to instrument the program, 
adding computations on new integer variables, but analyze them at compile time. 



Applying Integer Analysis. The static problem is to analyze the instru- 
mented program and to mark potentially unclean expressions by detecting that 
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an assert statement may not hold. We aim at a conservative solution, i.e., one 
that never misses an unclean string manipulation. Since our goal is to prove that 
conservative static cleanness checking of realistic C programs is feasible, while 
generating only a small number of false alarms, we use a rather expensive but 
precise integer analysis that detects linear restraints among the integer variables 
that our transformation introduces . The integer analysis is conservative thus 
resulting in an algorithm that detects all unclean string manipulations. 



A Prototype Implementation. We analyzed functions with massive string 
manipulations from three different applications: (i) fixoutput — a checker for 
the output of a lexical analyzer; (ii) agrep — a grep application; and (iii) web2c 
— a converter from TeX, Metafont, and other related WEB programs to C. A 
total of 19 errors were detected with only four false alarms. (See Table Q and 
Section R.,'ll for more information on the analysis results) . 



1.2 Related Work 

Many academic and commercial projects aim at producing tools that detect 
string-manipulation cleanness violations at runtime, e.g., Due to the 

overhead of runtime checking, these tools are usually not used in production. 
Moreover, their effectiveness strongly depends on the input tested, and these 
checks do not assure against future bugs on different inputs. There are tools 
that directly identify cleanness violations leading to security vulnerabilities such 
as StackGuard 0. 

An unpublished success story in program analysis is the usage of the AST 
ToolKit to identify 23 string violation bugs in Office 10 by scanning syntax 
trees m- An extension to LCLint, a widely used static checking tool, which 
checks for buffer overflow vulnerabilities is presented in PH. Following the ap- 
proach of LCLint, their aim is a scalable tool by using lightweight techniques. 
Procedures are annotated with constraints on the allocated space and on the 
string length. The loop body is analyzed once and then some heuristics are used 
to estimate the number of loop iterations and their effect on the constraints. 

In pS] Wagner et al. present an algorithm that statically identifies string- 
manipulation cleanness violations and apply it to uncover nontrivial defects in 
real software. This algorithm is based on flow insensitive analysis, and thus 
can handle large programs. However, it achieves that at the price of generating 
many false alarms , and, even worse, many errors are missed. For example, as 
observed in the bug in Figure Q(a) (which our algorithm correctly identi- 
fies) is skipped by their method because it does not handle pointer arithmetic. 
Indeed, in analyzing the reasons behind their false alarms, they have identified 
four needed techniques: flow-sensitivity, context-sensitivity, pointer analysis and 
linear invariants. Our method includes pointer analysis and linear invariants. It 
is flow-sensitive and annotations are used to analyze function calls. 

To summarize, the above static tools do not guarantee conservative result and 
produce false alarms. In contrast, since our method is conservative we assure 
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against certain defects and the user is relieved from the need to test against 
them. Our method is heavier than these tools but can identify cleanness errors 
resulting from destructive updates. Also, dynamic cleanness tools can benefit 
from our analysis which identifies cases were runtime checks are redundant. The 
reader is referred to m for additional references. 

Limitations. Our instrumentation does not handle multi-dimensional arrays, 
structures and multilevel pointers. These features can be handled by combining 
our method with instrumentation inside buffers, e.g., as done by SafeC. Our 
static analysis algorithm can handle these constructs conservatively, e.g., using 
the methods described in m- 

Code size increase due to the transformation is bounded by 0{m*n?), where 
m is the number of statements and n is the number of variables. However, 
applying a very simple flow-insensitive analysis prior to the transformation phase 
(see Section 12.211 allows maintaining much lower code increase in practice for all 
our benchmark programs. 

Our cleanness algorithm does not currently check the cleanness of subtraction 
expressions (i.e., p = q-\- exp when exp < 0). This is an easy extension which we 
choose to omit in this paper. 

The main limitation of our prototype implementation is the handling of func- 
tion calls. Obviously, cleanness checking should yield only a few false alarms, if 
at all. Therefore, since we currently rely on an expensive static technique, we 
avoid interprocedural analysis and allow preconditions and postconditions of 
each function to be specified (see Section 14.1 ll . Our initial experience indicates 
that even a straightforward implementation of the algorithm handles mid-size 
programs and that it is easy to provide annotations for these programs. We have 
no experience with large programs. 

2 The Transformation 

The instrumented program contains the original program with additional vari- 
ables and statements. Figure 0 shows the instrumented function of Figure ^a). 
The instrumented code is indented and appears in boldface. For reasons of clar- 
ity, we omitted some of the instrumented code and discuss it later in Section l2., 41 

A string variable is either a pointer to char, such as p in the example, or 
an array of characters such as s and t. For every string variable, v, two integer 
variables are added, t;_len and w_alloc. The first holds the length of the string 
V (excluding the null-terminator) and the second holds the allocated size of the 
buffer pointed to by u. These integers are initialized depending on the declaration 
and the initialization of the variable v. 

We say that a pointer variable p overlaps a pointer variable q by iiip points 
to the same physical buffer as q but at an offset of i bytes. That is, p = q + i. 
Note that if p overlaps q hy i then q overlaps p by —i. For every pair of string 
variables, vi and V 2 , an integer variable ui_overlaps_U 2 is added to hold the 
overlapping index between them. To reduce the number of overlapping variables 
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a pre-transformation analysis is applied to detect variables that cannot overlap, 
see Section o 

In addition to the integer variables, Boolean variables are added to indicate 
the status of the variable: (i) is_ii_alloc indicates whether variable v is allocated; 
(ii) is_?;_null indicates whether the data pointed to by r; is a null-terminated 
string; and (iii) is_ui_overlaps_U2 indicates whether variables v\ and V2 overlap. 

For every statement, st, there is a cleanness condition that is checked via an 
assert statement prior to statement st. The cleanness condition is generated in 
a syntax-directed fashion and is labeled as condst- For example, the cleanness 
condition of statement strcpy(dst,src) is that the length of src is less than 
the allocated size of dst. This guarantees that there is sufficient space to copy 
the string src and the null-terminator byte. 

Every statement st is followed by statements that update the instrumented 
variables. These are labeled as postst- In the example this is rather straight- 
forward. After statement li, the length of s is updated. After statement I2, 
p_overlaps_s, the length and the allocated size of p are set. After statement Z3 
the length of p is updated. In addition, since p overlaps s, the length of s is also 
updated. 



2.1 Simplifier 

The first phase of the transformation is a C-to-C simplifier which transforms 
every expression that may be unclean into a statement by assigning its value 
into a new variable. For example, the expressions &temp[i] and &temp[i+6] 
in statements I3 and 1^ in Figure E^c) are assigned to two new variables. This 
serves to mark that the first expression overlaps temp and the second expression 
overlaps both temp and the first expression. With the overlapping information 
we can precisely handle the destructive updates at statements I3 and Z4 and 
compute their effect on the length of temp. 



2.2 Preliminary Analysis 

The number of overlap variables added by the transformation can be quadratic 
in the number of string variables. However, in a typical program the number 
of variables that do overlap is small. Only assignments and function calls may 
generate overlapping variables. A preliminary intraprocedural flow-insensitive 
analysis is performed to detect which variables may overlap. This is done by using 
a union-find algorithm which assumes that if a may overlap b and b may overlap 
c then a may overlap c. This algorithm is overly conservative. For example, it 
assumes that the result value of a function call may overlap any of the input 
parameters and that every pair of global string variables may overlap. Even 
so, the experimental results described in Section El show that the number of 
overlapping variables is dramatically reduced. 



Cleanness Checking of String Manipulations in C Programs 201 



2.3 Transformation of Statements 

Table ^a)-(c) shows the transformation of basic statements. The original state- 
ment appears is boldfaced, its cleanness condition, if any, is written above the 
statement and the update statements of the instrumented variables are written 
below the statement. 

The assignment of a constant string to variable dst is fairly simple. There 
is no cleanness condition to check. The instrumented variables are updated ac- 
cording to the length of the constant which is known at the transformation time. 
All dst-overlapping variables are set to false. This is safe even in the presence 
of two pointers to the same constant string (the same label in the data section) 
since no updates can take place in the data section. 

Assignment of pointer arithmetic is one of the most complicated statement 
types to instrument. Following m pp.205], we verify that the result of a pointer 
arithmetic is within bounds or at the first location beyond the rightmost char- 
acter. The allocation size of dst is computed using the allocation size of src. 
Since src_len holds the position of the first null byte in the string starting at 
src, only if dst is set in between src and the null-terminator then the length 
of dst is computed using the length of src. In other cases the length of dst is 
computed using the macro RECOMPUTE shown in Table El This macro computes 
the length by calling strlenO and sets the Boolean variable is_null to true if 
the null-terminator is within bounds. We try to minimize the use of this macro 
since it cannot be analyzed statically in a precise manner. 

Assignments create overlapping. We set the overlapping variable between 
the dst and src of the statement and also between dst and any other dst- 
overlapping variable according to its src-overlapping. 

Destructive updates such as src [i] = c can come in two flavors: the assign- 
ment of ascii 0 (a null-terminator) and the assignment of any other character. For 
assignments of a null-terminator, we check whether the length of src has been 
changed. This happens when the current length (if it exists) is greater than i. 
Every destructive update to str may affect the length of variable a that overlaps 
str. The macro DESTRUCTIVEJUPDATE (shown in Table EJ detects if a has been 
affected and if so updates the length of a. In some cases the macro RECOMPUTE 
must be used to compute the new length of a. The case of a non-null character 
assignment is simpler since only the case where the null-terminator is removed 
affects the instrumented variables. 

Table Q](d)-(g) lists the transformation code of a few common standard C 
functions which are rather straightforward: mallocO sets the allocation to the 
new allocation size and eliminates any existing overlapping; strcpy (dst, src) 
sets the length of dst; strlen(src) checks that src is null-terminating and 
assigns src_len to the lefthand side; strcat (dst , src) checks that both pa- 
rameters are null-terminating and computes the new length of dst. In every 
destructive update overlapping variables are updated. Some cleanness condi- 
tions verify only the is_null variable since we already maintain the invariant 
is_null — > is_alloc. Due to space limitation we do not show the transformation 
code of the more complicated C library functions. 



202 Nurit Dor, Michael Rodeh, and Mooly Sagiv 



Table 1. The transformation code of basic statements: (a) assignment of a string 
constant, (b) pointer arithmetic and, (c) an update to an index in the array; and 
of a few C library functions: (d) mallocO, (e) strcpyO, (f) strlenO, and 
(g) strcatO. The original statement appears in boldface. 
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Table 2. The macros’ code used in the transformation. 



#define RECOMPUTE(str) 
strJen = strlen(str); 

is_str_null = (strJen < str_alloc ? true : false); 



#define DESTRUCTIVE_UPDATE(a,dst) 
if (is_a_overlaps_dst) 
if (is_dst_null && 

(a_overlaps_dst < len_dst) && /*a is between dst and the first null*/ 
( !is_a_null || a_overlaps_dst > - aJen) 

{ /*a is before dst or has a new nul */ 
is_a_null = true; 

aJen = dstjen - a_overlaps_dst; } 

else RECOMPUTE(a); 



Function Calls. Function calls are treated by passing additional parameters in 
a dynamic array pointed to by a global variable. The array contains the length 
and allocated size of every string parameter and the overlapping information of 
every pair of string parameters. Similarly, upon return, the information is passed 
back along with additional information on the return value and its overlapping 
with the string parameters. Cases where either the callee or the caller is not 
instrumented are handled similarly to the approach described in H2| 

Derived Expressions. To enhance the precision of the static analysis the 
following expressions are replaced by equivalent expressions that use the instru- 
mented variables: 

*str == 0 = (is_str_null && strJen == 0) 

*str == c (c ^ 0) = ((!is^tr_null || strJen > 0) && *str == c) 
p - q = p_overlaps_q 

p > q = p_overlaps_q > 0 

The first two are conditions on the length of the string. Pointer comparisons and 
subtractions are replaced by expressions over the appropriate overlap variable. 



Proving that our transformation preserves the semantics of clean C programs 
involves establishing certain invariants on the value of the auxiliary instrumenta- 
tion variables. For example, in the transformation shown in Table m) the value 
of src_len is the same as the result of the call to strlenO. 

3 Static Analysis 

Our static algorithm detects all the potentially unclean string manipulations. It 
analyzes the instrumented program and identifies assert statements that may 
be violated. The algorithm is conservative, i.e., if there exists an input on which 
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an assertion in the instrumented program is violated, the algorithm will identify 
this assertion. However, it may yield false alarms by identifying potentially- 
violated expressions even for assertions in the instrumented program that are 
never violated. 

Our algorithm is iterative. It simultaneously processes Boolean and integer 
variables by analyzing the program control flow graph and computing conser- 
vative information at every node. It therQ checks if any of the assertions are 
potentially violated and produces appropriate messages. 

To analyze Boolean variables a three- valued domain is used 0]. The value T 
results from joining true and false. The analysis handles Boolean assignments 
and conditions. 



bufjalloc = BUFSIZ 
tempjilloc = BUFSIZ 
cpjoverlapsJjuf = BUFSIZ — cpjilloc 
cp -alloc > cpJen + 1 
cp-alloc < BUFSIZ — 2 
cpJen > 0 
i = 1 

(a) 



buf -alloc = BUFSIZ 
tempjilloc = BUFSIZ 
cpjoverlaps-buf = BUFSIZE — cp-alloc 
cp-alloc > cpJen + 1 
cp-alloc < BUFSIZ -1-i 
cpJen > 0 
i > 1 
i < 2 

(b) 



buf -alloc = BUFSIZ 
tempjilloc = BUFSIZ 
cp-overlapsjmf = BUFSIZE — cp-alloc 
cpjilloc > cpJen + 1 
cpJen > 0 

i < BUFSIZ — 1 — cpjalloc 
i > 1 

(c) 



Fig. 3. Application of widening when analyzing the loop in insert J.ong. (c) 
displays the result of the widening operation applied to (a) and (b), the static 
information arising at the first and second iterations, respectively. 



The implementation carries out the check simultaneously. 



1 
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3.1 Integer Analysis 

We use a linear relation analysis algorithm which discovers linear inequalities 
among numerical variables m This method identifies linear inequalities of the 
form: U^j^CiXi + b > 0, where Xi is a variable and Ci and b are constants. 

In the analysis of the function insert_Long (see Figure Q(c)), the algorithm 
detects the inequalities shown in Figure |3(c) upon entry into the loop body (af- 
ter at least one execution of the loop bodjn). For simplicity, we only show the 
relevant (in)equalities. The first five (in)equalities are derived from the declara- 
tions of the string variables and from the function’s precondition. The last two 
inequalities constrain the loop induction variable i. Note that i is always less 
than BUFSIZ which is important to avoid reporting false alarms while accessing 
temp [i] and buf [i] . 

Figure 0| demonstrates how our static algorithm identifies the unclean ex- 
pression strcpy (fetemp [i] , "(long)") in function insert_long. For reasons 
of clarity, we omitted constraints that are irrelevant to the assert verified. 
From this information, one can understand that when the allocation size of the 
parameter cp is less than or equal to 6, a buffer overflow occurs at I3. For each 
potential cleanness violation, the algorithm locates the constraints on the vari- 
ables (of the instrumented program) where the violation occurs. Of course, in a 
practical tool these errors need to be reported in terms of the original program. 



Zsj : tmpl = temp -|- i ; 
conds'. assert ( tmpl_alloc > 6 ) 

error: the assertion may be violated when: 
i = BUFSIZ - cp_alloc 
tmpl_alloc = cp_alloc 
cp_alloc < 6 

h: strcpy (tmpl ,”(long)’’ ) ; 



Fig. 4. A report of a potentially unclean expression strcpy (fetemp [i] , 
"(long)") in insert_Long (see Figure ^c)) by the static analysis algorithm. 
(Expression (3^ is added by the simplifier.) 



Technically, linear inequalities are represented by a closed convex polyhedron 
(a polyhedron, for short) which has two representations: (i) a system of linear 
inequalities P = {A|AA > B}, where A is an m x n-matrix, B is an m- vector, 
and n is the number of numerical variables; (ii) a system of generators which 
can be viewed as a geometric representation. See m for more information. 

^ Our simplification step transforms all loops into do-while loops with surrounding 
conditional statements. 
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Upon termination of the iterative algorithm, the polyhedron at every control 
flow node conservatively represents the inequalities that are guaranteed to hold 
whenever the control reaches the respective point. The polyhedron at function 
entry is T indicating that no inequalities are known to hold. We then proceed 
as follows: 



Assignments. Every assignment generates a new polyhedron from the poly- 
hedron before the statement. Linear assignments of the form x = + b 

are handled by substituting E’^^^CiXi + b for x in the polyhedron before the as- 
signment and then normalizing. Other assignments are handled conservatively 
by removing inequalities of affected variables from the polyhedra. 



buf .alloc = BUFSIZ 
tempjilloc = BUFSIZ 
cp-overlapsjbuf = 0 

cpjalloc = BUFSIZ 
cpJen < BUFSIZ - 1 
cpJen > 0 
i = 0 

(a) 



buf. alloc = BUFSIZ 
tempjalloc — BUFSIZ 
cpjalloc = BUFSIZ - i 
cpjaverlapsJmf = i 

cpjalloc > cpJen -|- 1 
cpjalloc < BUFSIZ - 1 
cpJen > 0 

(b) 



buf .alloc = BUFSIZ 
temp.alloc = BUFSIZ 
cp.overlapsJmf = i 

cpjalloc = BUFSIZ - i 
cpJen > 0 
i > 0 

(c) 



Fig. 5. The result of join operation at in the analysis of insert_Long. (c) 
shows the result of joining (a) which corresponds to the case the loop was not 
executed and (b) which assumes that the loop was executed. 



Program Conditions. It is very important to analyze conditions in order to 
avoid many false alarms. A non-linear condition is handled conservatively by 
assuming that both branches may occur. A linear condition cond is handled 
by: (i) creating a polyhedron that represents cond and intersecting it with the 
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polyhedron before the conditional statement to conservatively determine the 
polyhedron at the true-branch, and (ii) creating a polyhedron that represents 
! cond and intersecting it with the polyhedron before the conditional statement 
to conservatively determine the polyhedron at the false-branch. The polyhedron 
for cond and ! cond are conservatively determined. Since we only have to deal 
with integers, negations can be handled more precisely than general numerical 
variables. For example, we have the following equivalence: !(UJLiCiXi + b > 0) = 
(r'Li(-c,)x, + (-6-i))>0. 



Control Flow Merges and Loops. When two control flow edges are merged, 
a new polyhedron “including” both polyhedra is computed. Figure 0 shows the 
result of the join operation at I 3 in the analysis of insertJLong. Figure 0a) 
shows the polyhedron corresponding to the case where the loop is not executed. 
In this case, i is 0 and cp_overlaps_buf is 0, indicating that cp and buf both 
point to the same location. Figure 0^b) shows the polyhedron corresponding to 
the end of the execution of the loop. Thus, cp and &buf [i] point to the same 
location, which implies that i is equal to cp_overlaps_buf . The result of the 
join operation is shown in Figure 0c). The resultant polyhedron includes the 
case where i is greater than zero and is equal to cp_overlaps_buf . 

Loops present a more involved situation. In order to guarantee that the algo- 
rithm terminates, a widening operation is used on loop back edges. The resultant 
polyhedron is defined by those constraints which are satisfied by both polyhedra. 
Figure 01 demonstrates a widening operation at the loop in insert_Long. Fig- 
ure 0a) shows the polyhedron describing the first execution of the loop, where 
i equals 1. Figure 0b) shows the polyhedron describing the second execution 
of the loop, where i is between 1 and 2. The result of the widening is shown in 
Figure 0c) in which the boundaries of i are relaxed, i is at least 1 and less than 
BUFSIZ - cp_alloc. 

Widening operations may cause a loss of precision which in turn, may lead 
to false alarms. As a partial remedy a narrowing operation is performed by 
analyzing the loop body again without a widening operation. 



4 Experimental Results 

To investigate the usability of our approach in terms of false alarms and cost, 
we implemented our algorithm for a large subset of C. 



4.1 User Annotations 

As our prototype implementation analyzes one function at a time we allow func- 
tion prototypes to be annotated with preconditions and postconditions in the 
style of Icontract m and ESC Java 0. Preconditions and postconditions are 
written as C expressions without function calls but with the following extra 
built-in expressions on strings: 
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len(str) — The length of string variable str 

is_null(sir) — Is str null-terminated? 

alloc(str) — The allocation size of string variable str 

is_alloc(str) — Is str allocated? 

is_overlap(a,str) — Are a and str overlapping? 
overlaps (a, str) — The overlapping index of a from str 
In addition, for ease of annotating, there are some shorthand annotation expres- 
sions. For example: 

= ( is_alloc(s) && is_null(s) && len(s) > 0 && 
len(s) < alloc(s)) 

= ( is_overlap(a,b) && overlaps(a,b) < i && 
alloc(b) == alloc(a) -I- overlap(a,b)) 
overlap^eq(a,b,i) = ( is_overlap(a,b) && overlaps(a,b) > i && 
alloc(b) == alloc(a) -I- overlap(a,b)) 

Postcondition expressions use a special syntax pre@e to denote the value 
of the expression e at the function entry. Thus, a postcondition x = pre@x +1 
indicates that the value of the variable x after the call is equal to the value of x 
before the call plus one. Finally, postcondition expressions can use a designated 
variable return denoting the return value of the function. 



string(s) 
over lap Jeq(a,b,i) 



PRE( string(cp) && alloc(cp) <= BUFSIZ && 

overlapJeq(cp,buf,BUFSIZ-l) && overlap_geq(cp,buf,0) ) 
char* insert Jong (char *cp) 

POST( len(cp) == pre@len(cp) -|- 6 && return == pre@cp -|- 6 ) 



Fig. 6. The annotation of insert_Long. 



Annotations can be written in a separate header file with the function pro- 
totype. The analysis conservatively checks the precondition expression when an 
invocation of a function is processed. Then, it assumes that the postcondition 
expression holds after the call. The precondition expression is assumed to hold 
when the body of the function is analyzed, and a warning is issued when the 
postcondition is not guaranteed to hold at the exit of the function. 

The annotated prototype for the function insert_Long (shown in FigureE)c)) 
is given in Figure 0 This annotation is one of the most difficult annotations writ- 
ten in our study. Usually, annotations indicate string data types for parameters 
and return values. In this example, the precondition indicates that cp is a string 
and its allocation is not greater than BUFSIZ. The last two overlapping expres- 
sions in the precondition indicate that cp may only point to locations within 
buf . The postcondition of this function is obvious. 
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4.2 Prototype Implementation 

Our algorithm was implemented for a subset of C. The first phase is the simplifier 
implemented under SUIT It generates a new C file. As described in Sec- 
tion EH every expression that may be unclean is transformed into a statement. 
Also, for ease of implementation the following simplifications are applied: (i) com- 
plex expressions are replaced by a set of sequential expressions; (ii) function calls 
in expressions are transformed into statements; (iii) arguments in function calls 
are transformed into simple expressions (variables or constants); (iv) all types 
of loop statements are replaced by do-while statements. The simplified C file 
is annotated and is the input to the transformation phase implemented under 
EDG’s C-backend jZj. 

The pre-transformation step (described in Section 12.21 marks which string 
variables may overlap and generates a control flow graph. The main idea is to 
generate a “slice” of the program which only includes those parts of the program 
which are relevant to the analysis and control flow statements. It includes string- 
related integer variables. An integer variable i is string -related if it is used: (i) as 
an index to a string variable, e.g., str [j] , (ii) in a pointer arithmetic expression, 
e.g., str-fz, (iii) in a function call returning a string variable, e.g., malloc(i), 
or (iv) as a parameter to a function that also has a string parameter, e.g., 
strncpy (dst , src ,i) . 

The static analysis phase is implemented in C using the polyhedra library jn|. 
The final output of the static analysis is a list of potentially unclean expressions. 
The analysis provides a description of potentially- violated assert statements in 
the format shown in Figure 0 The instrumented program provides a conceptual 
tool in order to validate whether an error message is a false alarm — assertions 
which are flagged by the static analysis algorithm but cannot be violated by any 
program input. 

When our static analysis algorithm identifies an assertion violation, it flags an 
error (in the form shown in Figure Q. After reporting the error it (optimistically) 
assumes that the assertion condition holds on subsequent execution paths to 
avoid repetitive error messages. This is in line with the instrumented program 
that aborts at the first violated assertion. For example, our algorithm does not 
flag the expression cp+6 at Iq in Figure^c) as unclean. This expression is unclean 
when the allocation size of the buffer pointed to by cp is less than 6. In this case, 
strcpy (fetemp [i] , " (long) ") at I 3 is also unclean and must be executed prior 
to Iq. However, both expressions in I 3 and I 4 are detected as unclean since they 
can be violated on different inputs. 

4.3 Results 

Table E] presents the analysis results. Columns 3-5 provide source information: 
(i) number of lines of simplified code, (ii) number of nodes in the analyzed CFG, 
and (iii) number of string variables in the function including global variables. 
The ratio between LOG and CFG indicates the fragment of the program ana- 
lyzed. In some cases, such as backup and insert_Long, all statements are string 
manipulations. In others, such as main, a small fraction is analyzed. 
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Table 3. The Experimental Results. The column named ’’Annotations” indi- 
cates the level of difficulty of annotating the function: S - simple, M - moderate 
and D - difficult. 



App. 


Function 


Source code info 


Static Analysis 


Messages 


LOG 


CFG 


Strings 


Annotations 


Overlap 


Vars 


CPU sec 


Space MB 


False 

Alarms 


Errors 


><; 


flush 


19 


11 


6 


S 


15 


30 


0.4 


1.2 


0 


0 


getchar 


33 


31 


8 


D 


15 


36 


3.0 


5.1 


0 


3 


o 

c 


backup 


22 


28 


6 


D 


6 


22 


0.6 


0.7 


0 


0 




getstr 


50 


42 


7 


D 


6 


28 


8.0 


9.2 


0 


0 




main 


403 


157 


10 


S 


10 


44 


309.9 


99.5 


0 


0 




extend_re 


30 


14 


11 


S 


28 


54 


3.2 


9.6 


0 


0 


P 


init 


45 


19 


3 


s 


1 


9 


0.3 


0.3 


0 


0 


CD 


coutline 


22 


12 


10 


M 


28 


61 


1.4 


4.4 


0 


0 




m_short 


187 


109 


14 


M 


78 


125 


279.8 


242.7 


0 


5 




fprint_pascal_string 


29 


10 


3 


S 


1 


9 


0.1 


0.1 


0 


2 




nulLterminate 


11 


7 


2 


S 


0 


5 


0.1 


0.1 


0 


2 




space_terminate 


12 


9 


2 


S 


0 


5 


0.1 


0.1 


0 


0 


cr 


extendfilename 


17 


13 


4 


S 


1 


11 


0.3 


0.6 


0 


2 


o 


removemewline 


18 


11 


12 


S 


29 


56 


0.8 


3.4 


0 


0 




insert Jong 


21 


25 


14 


D 


78 


110 


19.6 


38.8 


0 


2 




join 


44 


27 


13 


M 


29 


60 


8.9 


16.0 


3 


1 




skip_balanced 


72 


24 


9 


S 


28 


53 


2.8 


5. 2 


1 


0 




bare 


85 


57 


9 


M 


28 


53 


13.8 


17.1 


0 


2 



Column 6 indicates the level of difficulty of annotating the functions. For most 
functions it was rather straightforward (marked as “Simple”). In some functions, 
additional preconditions are needed to avoid false alarms, such as on the allo- 
cation size of the parameter (marked as “Moderate”). Certain functions, such 
as insert_long shown in Figure ^c), required introducing preconditions on the 
overlapping between parameters and/or global variables (marked as “Difficult”). 

Columns 7-10 show some statistics on the analysis: (i) the number of over- 
lapping variable^; (ii) the total number of integer variables in the analysis: it 
includes two variables per string, the string-related integer variables, and the 
overlap variables; (iii) the elapsed CPU time of the analysis in seconds on a 
Pentium 366 with 128MB running Windows 2000; and (iv) the maximum size 
of allocated memory during the analysis in MBytes. 



® The maximum number of overlap variables is n * (n — l)/2, where n is the number 
of string variables. Only flush requires that many variables. 
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From a user point of view, the most important information appears in columns 
11-12 that show the number of false alarms vs. the number of errors found. We 
verified errors by running the program with an appropriate input. A total of 19 
errors were detected. Ten errors are due to unsafe pointer arithmetic resulting 
in a pointer beyond the bounds of the buffer. These errors are due to assump- 
tions that a certain character appears in the string, such as in the function 
null_terminate shown in Figure m which assumes that there is a blank in 
the string. Five errors are due to unsafe calls to standard C functions, such as 
the calls to strcpyO in insert_long. The remaining four errors are updates 
of a single character out of the buffer bounds, such as the expression *s = 0 in 
null_terminate. 

In the function join the analysis produced three false alarms due to a com- 
plex condition in a for loop. In this case the widening operation causes loss 
of important constraints. The false alarm in skip_balanced occurs because it 
safely assumes that it is invoked with a balanced number of parentheses in the 
input parameter. This is verified in a prior call to a different function. This 
example demonstrates that in some cases it is hard to separate cleanness from 
correctness — in order to show that this function is clean we need to verify 
correctness, i.e., that the implementation correctly checks that the input string 
contains a balanced number of parentheses. Fortunately, in most of the analyzed 
examples, this is not the case, i.e., cleanness does not depend on correctness. 

5 Conclusions 

We have shown that real software defects can be identified by conservative static 
analysis tools in a realistic subset of C. The analysis produces very few false 
alarms. It assures against certain defects and the user is relieved from the need 
to test against them. The analysis combines pointer and integer analysis resulting 
in a rather precise information. 

In the future we intend to analyze the instrumented program with different 
integer analysis techniques including interval analysis and other linear relation 
analyses. We also plan to generalize our method for a larger subset of C. Another 
fruitful area of research is automatically generating procedure annotations. 
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Abstract. This paper describes the precise specification, design, analy- 
sis, implementation, and measurements of an efficient algorithm for solv- 
ing regular tree grammar based constraints. The particular constraints 
are for dead-code elimination on recursive data, but the method used for 
the algorithm design and complexity analysis is general and applies to 
other program analysis problems as well. The method is centered around 
Paige’s finite differencing, i.e., computing expensive set expressions incre- 
mentally, and allows the algorithm to be derived and analyzed formally 
and implemented easily. We propose higher-level transformations that 
make the derived algorithm concise and allow its complexity to be an- 
alyzed accurately. Although a rough analysis shows that the worst-case 
time complexity is cubic in program size, an accurate analysis shows that 
it is linear in the number of live program points and in other parame- 
ters, including mainly the arity of data constructors and the number of 
selector applications into whose arguments the value constructed at a 
program point might flow. These parameters explain the performance of 
the analysis in practice. Our implementation also runs two to ten times 
as fast as a previous implementation of an informally designed algorithm. 

1 Introduction 

Regular tree grammar based methods are important for program analysis, espe- 
cially for analyzing programs that use recursive data structures 
Basically, a set of grammar-based constraints is constructed from the program 
and a user query and is then simplified according to a set of simplification rules 
to produce the solution. Usually, the constraints are constructed in linear time 
in the size of the program, and the efficiency of the analysis is determined by 
the constraint-simplification algorithms. 

This paper describes the precise specification, design, analysis, implementa- 
tion, and measurements of an efficient algorithm for solving regular tree grammar 
based constraints. The particular constraints are for dead-code elimination on 
recursive data, but the method used for the algorithm design and complexity 
analysis is general and applies to other program analyses as well. 

The method is centered around Paige’s finite differencing l,11tUI,‘hil . i.e., 
computing expensive set expressions incrementally. It starts with a fixed-point 

* This work is supported in part by ONR under grants N00014-99- 1-0132, N00014- 
99-1-0358, and N00014-01- 1-0109, by NSF under grants CCR-9711253 and CCR- 
9876058, and by a Motorola University Partnership in Research Grant. 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 213-E3S1 2001. 
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specification of the problem, then applies (1) dominated convergence at the 
higher level |B| to transform fixed-point expressions into loops, (2) finite differ- 
encing mm to transform expensive set expressions in loops into incremental 
operations, and (3) real-time simulation at the lower level (.'i.'lYj to transform 
sets and set operations to use efficient data structures. This method allows the 
algorithm to be derived and analyzed formally and implemented easily. 

We first give a precise fixed-point specification of the problem. We then 
transform it into a loop and apply finite differencing completely systematically, 
making all the steps explicit. At the higher level, we study new transformations 
that make the derived algorithm concise and allow its complexity to be analyzed 
accurately. The complexity analysis captures the exact contribution of each pa- 
rameter. In particular, although a rough analysis shows that the worst-case time 
complexity is cubic in program size, an accurate analysis shows that it is linear 
in the number of live program points and in other parameters, including mainly 
the arity of data constructors and the number of selector applications into whose 
arguments the value constructed at a program point might flow. These param- 
eters explain the performance of the analysis in practice. At the lower level, we 
show that real-time simulation using based representation applies only par- 
tially to our application, and we discuss data structure choices and the trade-offs. 
In particular, our accurate complexity analysis at the higher-level suggests that 
combination with unbased representation works well in our application, and our 
experiments support this. Our implementation runs two to ten times as fast as 
a previous implementation of an informally designed algorithm |2S|- 
The main contributions of this work include 

(1) the application of a powerful, systematic transformational design method- 
ology that leads from a precise high-level fixed-point specification of a non- 
trivial problem to a highly efficient algorithmic solution, 

(2) the identification of parameters in problem instances and the precise expres- 
sion of the algorithm complexity in terms of these parameters, and 

(3) the implementation and experiments that help confirm the accuracy of the 
complexity analysis and compare the efficiency of the algorithm with that of 
an informally designed algorithm. 

It is not the goal of this paper to show a drastically new algorithm or algorithm 
design method. Instead, since program analysis is a central recurring task in 
all kinds of program manipulation, and static analysis is naturally described as 
computing fixed points, the goal is to show the systematic nature of the design 
method in the hope that it can be more widely used for developing analysis algo- 
rithms, to allow easier correctness proof, algorithm understanding, performance 
analysis and comparison, and implementation. At the same time, through such 
usage, one may further improve the design method, for example, as we study 
the transformations and accurate complexity analyses enabled by Theorem ^ 

2 Problem Specification 

The Specification from the Application. We first look at the grammar con- 
straints and the simplification algorithm for the dead-code elimination appli- 
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cation in EiO There, regular tree grammars, called liveness patterns, represent 
projection functions that project out components of values and parts of programs 
that are of interest. 

The grammar constraints constructed from a given program or given in a 
user query consist of productions of the following standard forms: 



N—> d dead form, where d is a special constant 

I live form, where I is a special constant 

c{Ni, Nk) constructor form, where c is from a set of constructors and 

may have arity 0 



and the following extended forms: 



N'—> N copy form 

A'’'— > (N) selector form 

N'^ conditional form, 

where R' is of forms I, c{Ni, Nk), and N” . Symbols d, I, and c’s are ter- 
minals, and symbols N, Ni,...,Nk, N', N" are nonterminals. The extended 
forms are simplified away using the algorithm below, where R is of forms I and 
c{Ni, ...,Nk), which are called good forms. 



input: productions P of standard forms and extended forms; 

repeat 

if P contains N' ^ N and R, add R to P; 
if P contains N'^cT^{N) and N^l, add N'^l to P; 
if P contains N'^c~^{N) and c{Ni, Nk), add N'^Ni to P; 
if P contains N'—^ [A'^JP' and R, add R' to P; 
until no more productions can be added; 

output: the resulting productions in P that are of good forms. 



Throughout the paper, we use R' to denote right-side forms I, c{Ni, ..., Nk), and 
N” . We use R to denote right-side good forms I and c{Ni, ...,Nk); when i? is a 
variable whose value could be an N form, it is accompanied by a test to ensure 
that its value is a good form. 

In the application, extended forms are constructed from programs: for each 
program construct below on the left, the corresponding productions on the right 
are constructed, where a nonterminal associated with (at the left upper corner 
of) a program point denotes the liveness pattern for the values at that point. 



^ The presentation here includes minor notational changes and simplifications. In par- 
ticular, in the condition in the first production for a binding expression is 
unnecessary. 
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function definition: 
data construction: 
selector application: 
tester application: 
primitive operation: 
conditional: 

^if^idithen^2:e2 else^^ds Ni^[N]l, iVz^iV, N 
binding: 

^'let U = ^^'ei in ^^'62 for each free occurrence of in 62 , N 2 — :• N 

function application: 

Ni^ [iV]7V' for i = l..n, N'^N 

where ' e 



Ni — > TV/ for i = l..n and for each occurrence of in 6 

Ni^ cf (N) for i = l..n 

i—\ n—i 

Ni—> [A^]c(d, ..., d, N, d, ..., d) for c of arity n 

n 

[A^]c(d, ..., d) for each possible c of arity n 
Ni^ [A'JZ for i = l..n 



Standard forms are given in user queries to indicate program points of interest 
and liveness patterns of interest at those points. For example, a user query iV— > I 
indicates that the entire value at point N is of interest. Simplification aims to add 
standard forms that capture the effects of extended forms. After simplification, 
program points whose associated nonterminals do not have a right-side good form 
are identified as dead. Appendix ^ gives a small example program together with 
the constructed grammar, a user query, and the simplification result. 

All the production forms here are the same as or similar to those studied by 
many people. For example, standard forms are as in imm, copy forms are 
common in grammars, selector forms are first seen in 122!, and conditional forms 
have counterparts in nni. Overall, the constraints here extend those by Jones 
and Muchnick ( 22 |. 



Notation. We use a set-based language. It is based on SETL extended 

with a fixed-point operation by Cai and Paige 0 ; we allow sets of heterogeneous 
elements and extend the language with pattern matching. Primitive data types 
are sets, tuples, and maps, i.e., binary relations represented as sets of 2-tuples. 
Their syntax and operations on them are summarized below: 



{Yi,...,A4 

[Yi,...,A„] 

{[Ai,Yi],...,[A„,Y.]} 

{} 

S'UT, S-T 
S with A, S less A 
S' C r 

A in S, A notin S 

#S 

T(I) 

dom M 

M{X} 

inv M 



a set with elements Ai,...,A„ 
a tuple with elements Ai,...,A„ in order 
a map that maps Ai to Y\, ..., A„ to Y„ 
empty set 

union and difference, respectively, of sets S and T 
S U {A} and S — {A}, respectively 
whether S is a subset of T 

whether or not, respectively, A is an element of S 

number of elements in set S 

7’th component of tuple T 

domain of map M, i.e., {A : [A, Y] in M} 

image set of A under map M, i.e., {Y : [Z,Y] in M \ Z — X} 

inverse of map M, i.e., {[Y, A] : [A, Y] in M} 
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We use the notation below for pattern matching against constants and tuples. 
The second returns false if X is not a tuple of length n; otherwise, it binds Yi to 
the ith component of X if Yi is an unbound variable, and otherwise, recursively 
tests whether the ith component of X matches Yi, until either a test fails or all 
unbound variables in the pattern become bound. 

X of c, where c is a constant whether X is constant c 
X of [Vi, ..., Y„] whether X matches pattern [Yi, ..., Yn] 

We use the notation below for set comprehension. Yi’s enumerate elements of all 
Si’s; for each combination of Yi, ...,Yn, if the Boolean value of expression Z is 
true, then the value of expression X forms an element of the resulting set. Each 
Yi can be a tuple, in which case an enumerated element of Si is first matched 
against it. 



{X : Yi in Si, ..., Y„ in Sn \ Z} set former 

{X : Yl in Si, ..., Y„ in Sn} abbreviation of {X : Yi in Si, ..., Y„ in Sn \ true} 

{Y in S I abbreviation of {Y : Y in S | Z} 

LFPc ^x{F{Y),Y) denotes the minimum element Y, with respect to partial 
ordering C, that satisfies the condition X Y and F{Y) = Y. We abbreviate 
X ■= X op Y as X op := Y. Also, we abbreviate X\ := Y;...;A„ := Y as 
Ai,...,A„ := Y. 

A Set-Based Fixed-Point Speeifieation. We represent the right-side R' forms as 
follows: 

I as I, where I is a special constant 

c(Ai,...,Afe) as [c,[Ai,...,Afc]] (2) 

N as A 

and represent the productions as follows: 

A'— > A' as [A', representation of A'] 

N'^c~\N) as [N',c,i,N] (3) 

N'^ [A] A' as [A', A, representation of R'] 

This representation allows us to distinguish all the production forms by simple 
pattern matching against constants and tuples of different lengths. We also need 
to tell whether an R' form is an R form or an N form, so for convenience, we 
define: 

R' isR = R' of I or R' of [G, T] 

R' isN = not [R’ of I or R’ of [C, T]) 

The simplification algorithm in m can be specified as follows. The input is a 
set P of productions in the new representation. The repeat-loop computes the 
minimum set Q that satisfies P C Q and F{Q) C Q, where F{Q) captures, 
line-by-line, the four rules in the loop body: 

F{Q) = {[A', A] : [A', A] in Q, [A, R] in Q \ R isR} U 
{[A', 1] : [A', C, I, A] in Q, [A, 1] in Q} U 
{[A', T(/)] : [A', G, 7, A] in Q, [A, [G, T]] in Q} U 
{[N',R'] : [A', A, 7?'] in Q, [A, 7?] in g | 7? isR} 



( 5 ) 
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Since F{Q) C Q iff F{Q) UQ = Q, the loop computes 

LFPc,p(F(Q)ug, Q) (6) 

The output is the set O of resulting productions whose right side is a good form: 

O = {[AT, i?] in LFP c ,p{F{Q) VJQ,Q)\R isR} (7) 

Note that G = AQ.F"(Q)U (5 is monotone, i.e., if Qi C Q2 then G(Qi) C G(Q2), 
and is inflationary at P, i.e., P C G{P). 

The representation of constraints using SETL tuples is immaterial to the 
problem. However, efficient algorithms for simplifying the constraints require 
the use of auxiliary maps, as discussed in Section El both for discovering such 
auxiliary expressions and for systematically manipulating them, uniform nota- 
tion helps. 



3 Approach 



The method has three steps: (1) dominated convergence, (2) finite differencing, 
and (3) real-time simulation. 

Dominated convergence 0 transforms a set-based fixed-point specification 
into a while-loop. The idea is to perform a small update operation in each iter- 
ation. The fixed-point expression LFPg ,p{F{Q) U Q, Q) in (EJ is transformed 
into the following while-loop, making use of XQ.F{Q) U Q being monotone and 
inflationary at P: 

Q~P\ 

while exists p in F{Q) — Q (8) 

Q with := p; 

This code is followed by 



O — {[N, i?] in g I i? isR}-, (9) 

Finite differencing mm transforms expensive set operations in a loop into 
incremental operations. The idea is to replace expensive expressions expi, ..., 
expn in a loop LOOP with fresh variables ..., En, respectively, and maintain 
the invariants Ei = expi, ..., = expn by inserting appropriate initializa- 

tions or updates to Ei, ..., En at each assignment in LOOP. We denote the 
transformed loop as 

AEi,...,En (LOOP) 

For our program (|H|) and (|3) from Step 1, expensive expressions, i.e., non- 
constant-time expressions here, are the one that computes O and others that 
are needed for computing F{Q) — Q. We use fresh variables to hold their values. 
These variables are initialized together with the assignment Q := P and are 
updated incrementally as Q is augmented by p in each iteration. Liu m gives 
references to much work that exploited related ideas. 

Real-time simulation FT!171 selects appropriate data structures for represent- 
ing sets so that operations on them can be implemented efficiently. The idea is 
to design sophisticated linked structures based on how sets and set elements are 
accessed, so that each operation can be performed in constant time with at most 
a constant (a small fraction) factor of overall space overhead. 
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4 Finite Differencing 

Identifying Expensive Subexpressions. The output O in © and expensive subex- 
pressions used to compute O need to be computed incrementally in the loop. The 
latter expressions are El to E4, one for each of the sets in F{Q) in ©, and W, 
the workset: 



El = {[N', R] : [N', iV] in Q, [A^, R] in Q | R isR} 

E2 = {[N', 1] : [N', C, I, N] in Q, [N, 1] in Q} 

E3 = {[N',T(I)] : [N',C,I,N] in Q,[N,[C,T]] in Q} (10) 

E4 = {[iV', R'] : [N', N, R'] in Q, [iV, R] in Q \ R isR} 

W = E{Q) - Q = Elyj E2VJ E?,yj EA- Q 

Thus, the overall computation becomes 

AO, El, E2, E3, E4,W { Q := P; 

while exists pinW (11) 

Q with := p; } 

Discovering Auxiliary Expressions. To compute El to E4 incrementally with 
respect to Q with := p, the following auxiliary expressions Ell to E41 are 
maintained. Expression Ell maps N to N' if there is a production of form 
N'^ N . Expression E21 maps N to N' and expression E31 maps [c, N] to 
[N\ i] if there is a production of the form iV'^ c~^{N). Expression E41 maps N 
to [fV', R'] if there is a production of form [7V]i?'. 

Pll = {[N,N'] : [N',N] in Q I iV 
E21^{[N,N']-.[N',C,I,N]inQ} 

P31 = {[[C,lV],[Ar',7]] : [N',C,I,N] in Q} 

EAl = {[N, [AT',P']] : [N',N,R'] in Q) 

These expressions are introduced for differentiating El to EA, respectively. For 
example, Ell is introduced for differentiating El in (1 1 1 )ll after adding an element 
[E, i?] in Q — we need to add [E',E] to El for all in Q, i.e., for all N' 

in E11{A^}. These expressions can be obtained systematically based on the set 
formers in m-- after adding an element corresponding to one enumerator, create 
based on the other enumerator a map from variables that are already bound 
to variables yet unbound. For example, consider E3 and adding an element 
[N, [C, r]] in Q. Then, for [N' ,C, I, N] in Q, variables C and N are bound, and 
N' and I are not. So, we create a map from [C, E] to [N' , I] for each [N' , C, I, E] 
in Q, which is E31. Now, the overall computation becomes 

AO, El, E2, E3, E4, W, Ell, E21, E31, E41 ( Q ~ P-, 

while exists p inW (13) 
Q with ;= p\ ) 

These auxiliary maps provide, at a high level, the indexing needed to support 
efficient incremental updates. 
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Transforming Loop Body. We apply finite differencing to the loop body. This 
means that we differentiate O, El to i?4, W, and Ell to E41 with respect to 
Q with := p in 

AO, El, E2, E3, E4, W, Ell, E21, E31, Ell { Q with := p; ) (14) 

Based on the elements added to W, which is through El to El, p can be of forms 
[N,l], [iV, [C, T]], and [W,iV] where N isN. For each form of p, we determine 
how the sets O, El to El, and Ell to Ell are updated. Also, for each of the 
forms, we do two things to update W. First, with anything added into El to El, 
if it is not in Q, then it is added to W. Second, remove p from W. We obtain 
the following complete code for the loop body: 



Q with p; 

W less p; 

case p of 



[N, i?], where R isR : //if p is R 

O with [N,R]-, 

El U fi] : N' in //add R for all N 

IV U -R] : at' in S11{A^} | [N\ R] notin Q}; 

E4 U {[AT^ in E41{N}}; //add N'^ R' for all N'^ 

ly U {[N', R'] in E41{N} \ [AT', R'] notin Q}; 

[A^, /] : //if p is I Q 

E2 U := {[AT'.i] : JV' in E21{AT}}; //add i for all Cyb-'V) ^ 

W U := {[Al',1] : N' in E21{AT} | [IV', 1] notin Q}; 

[AT, [C,T]j : //if p is N^C{T(l),...,T(k)) 

E3 U := {[Al',T(/)j : [Af',/j in £131{[C, Af]}}; //add N'^T{I) for all N'^C]^{N) 

W U := {[AT',T(/)] : [N' , I] in £131{[C, Af]} | [Af',T(/)j notin Q}; 

[Al', Al], where N isN : i P is N' ^ N 

El U ■.= {[N' , R] ■. R in 0{A1}}; //add N'^ R for all R 

W U := {[AT'.d : R in 0{AT} | [Af'.fl] notin Q}; 

Ell with := [AT, AT']; 



These updates are keys for achieving high efficiency: after adding a new produc- 
tion, we consider only productions that are directly affected. 



Initialization. Sets O, El to El, W, and Ell to A41 need to be initialized 
together with Q := P in m- To do this, we add each p from P into Q one by 
one, and update each of these sets incrementally as in the loop body. We have 
the same four cases of p as in the loop body dEJ and the cases for two additional 
forms of p, namely [A^', C, I, TV] and [N', N, i?]. We obtain the following complete 
code for initialization: 

O, El, E2, E3, E4, W, Ell, E21, E31, E41, Q := {}; 

for p in P 

Q with p; 

W less p; 
case p of 

same four cases of p as in the loop body 

[AT',C, /, AT] : //if p Is AT'^ Cyb^V) 

E2 U := {[Af', 1] : Z in Q{N}}-, //add N’^ I for all I 

W U := {[Af', 1] : Z in Q{N} \ [N' , 1] notin Q}; (16) 

E21 with := [Af, Af'j; 

E3 U := {[Af',T(/)j : [C,T] in Q{Af}}; //add N'^T(I) tor all C(T(1), ...,T(k)) 
W U := {[Af',T(/)i : [C,T] in Q{N} \ [Al',T(/)j notin Q}; 

E31 with := [[C, Af], [Af', /]]; 

[N',N,R'\-. //if p is Af'^ [Af]fl' 

E4 U := {[Af', E'[ : R in Q{Af} | R isR}-, //add N'^ R' for all R 

W U := {[Af', E'i : R in Q{Af} | R isR, [N' , E'[ notin Q}; 

E41 with := [Af, [Af', E'][; 
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Dead- Code Elimination. Since only O is the desired output, it is easy to see that 
El to EA are not needed, i.e., they are dead. Furthermore, Q can be eliminated 
using the equivalences: 

[A^, R] in Q, where R isR [A^, R] in O 
[A^', N] in g, where N isN <=> [A^, A^'] in Ell 

We obtain the following complete algorithm: 

O, W, Ell, E21, E31, E41 := {}; 

for p in P 
W less p\ 
case p of 

same four cases of p as in the loop body 
[N',C,I,N] : 

W U ■.= {[N' , 1] ■. I in 0{N} \ [N',l] notin O}; 

E21 with := [IV, IV']; 

W U := {[N',T(I)] : [C,T] in 0{N} \ [T(I),N'] notin Pll}; 

E31 with := [[C, N], [IV', /]]; 

[M', M,P'] : 

W U {[a:"', it'] : R in OIA:"} | if it' isR then ]iV', it'] notin O else ]it',A^'] notin Pll}; 
E41 with := [IV, [AT', it']]; 

while exists p in W (Ff) 

W less p\ 
case p of 

[A:^, it], where it isR : 

O with := [AT.it]; 

W U ■.= {[AT', it] : AT' in B11{AT} ] [AT', it] notin O}; 

W U := {[AT', it'] in £141{Ar} | if it' isR then [AT', it'] notin O else [it', AT'] notin Ell}- 
[AT. 1] 

W U := {[AT',/] : N' in P21{AT} ] ]AT',i] notin O}; 

[AT. [C.T]] : 

W U := {]AT',T(i)] : [AT'.i] in £131{]C, AT]} ] ]T(i),Ar'] notin Ell}-, 

[AT', AT], where N isN : 

W U := {[AT', it] : it in 0{AT} | [AT', it] notin O}; 

Ell with [AT, AT']; 

where W U := {X : F in S' | Z} is implemented as 

for y in S 

if Z then (18) 

W with := X; 

Complexity Analysis. For now, we assume that set initialization S := {}, re- 
trieval of an arbitrary element in a set by for or while or an indexed element 
by T(/), element addition and deletion S with/less X, and associative access 
X notin S and M{X} each takes 0(1) time; Section El describes how to achieve 
this. Other operations clearly take 0(1) time. 

Besides input size and output size #0, i.e., the number of productions in 
input and output, respectively, we use the following parameters. The meanings 
of these parameters are based on how the constraints were constructed. Note 
that sets Pll to P41 only grow during the computation, so we consider their 
values at the end. 

— Let a be the maximum of #P21{A^}, 4i=Elll{[C, A^]}, and #P41{A^} for any 
N and C. 

Meaning: In the application, a is the maximum of the arities of construc- 
tors, primitive functions, and user-defined functions and the number of pos- 
sible outermost constructors in the argument of a tester (such as null). In 
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fact, ij^E21{N} and , N]} are bounded by the maximum arity of 

constructors only. 

~ Let h be the maximum number of nonterminals to the left of a nonterminal: 

h — max JV in dom £11 (tH) 

Meaning: In the application, for productions built from programs, 
^E1\{N} < 2 for any N (2 for a conditional expression, 1 for a binding 
expression and a function call, 0 for others). However, E\1 and h may grow 
during simplification. 

— Let g be the maximum number of good forms a nonterminal goes to: 

g — max IV in dom o (20) 

Meaning: In the application, a good form is either I or the right side of 
a constructor form constructed at the argument of a selector or a tester, 
and testers together generate no more than a constructor forms. Thus, g 
corresponds to the maximum of a and the maximum number of selector 
applications into whose arguments the value constructed at a program point 
might flow. 

— Let r be the size of the domain of O: 

r = ^dom O (21) 

Meaning: In the application, r is the number of live program points. Note 
that < r * g. 

— Let n be the number of nonterminals in P. 

Meaning: In the application, n is the number of program points plus the 
number of nonterminals introduced in a user query. A user query usually has 
a small number of productions, and at most a+1 productions are constructed 
at each program point, so usually < n* a. 

Parameter n is not used in the precise complexity analysis, but it best cap- 
tures program size. Also, n bounds h, and bounds g; the latter is because 
all good forms are in the given productions, so there are at most of them. 

The complexity is the sum of (i) a constant for each element considered for 
addition to W, as in all the assignments to W, (ii) a constant for each element 
in W, as in the iterations, and (iii) a constant for each element in P, as in the 
initialization. Clearly, (ii) is bounded by (i), and (iii) is 0{^P). The total for 
(i) is the sum of (cl) to (c8) below, where (cl) to (c5) are for cases 1 to 4 in 
both the iteration and initialization, and (c6) to (c8) are for cases 5 and 6 in the 
initialization, explained below. 



cases 1-3: E [n,r] in o #Ell{N} (cl) 

in o #A21{A} (c2) 

E [JV,[C,T]] in o #A31{[C, N]} (c3) 

in o #E41{N} (c4) 

case 4: E [jv.jv'] in sii #0{N} (c5) 

case 5: E in p in 0{N}} (c6) 

E [N',c,i,N] in p #{[Ci T\ in 0{A}} (c7) 
E [N' ,N,R'] in p #{A in 0{A}} (c8) 



case 6: 
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For each p of form i?], all N' in ifll{iV} and all [N',R'] in if41{A^} are 
considered; since each p of form [N, R] is added to set O, the total complexity 
for case 1 is (cl) plus (c4). The other cases are similar. 

Using the parameters introduced above, we have 

(cl) < h* (c2) < a* r (c3) < a * (c4) < a* #0 (22) 

Note that 

(cl) = (c5) = r V in dom o #Ell{N} * #0{N} (23) 

A second way of estimating (cl) and (c5) is 

(cl) — (c5) < A^'^] in Ell \ N in dom O} =)= 5 by 

^ A^] in Q I TV in dom O} * g by definition of Sll 

< A^] in P I A^ in dom 0}+ those of form [A^', A^] in P (24) 

^{[A^^, A^] in P3 I N in dom 0}+ those of form [A^^, A^] in P3 

#{[A^', A^] in EA \ N in dom O}) * g those of form [A^^, A^] in EA where N isN 

these three contribute all of form [N' , A^] in Q 

< (r + (c3) + (c4)) 4 = g 

< (r A- a *■ + a * # 0 ) * g 



Therefore, (cl) and (c5) are 0(^0* g* a). Thus, the sum of (cl) through (c5) is 
0{^0*{h+a)), using the first way of estimating (cl) and (c5), and 0{^0*g*a), 
using the second way. Also, 



(c6), (c7), (c8) < g*#P (25) 

Thus, the total complexity of (i) to (iii) is 0{^0*mm{h+a, g*a)+^P*g+^P), 
which is 

0{#0 * mm{h + a, g * a) + #P * g) (26) 

since #0 A 0 and thus g 0 in the application. 

In the application, productions in P with right sides in good forms are from 
the user query; if we assume there is a constant number of them, then (c6) to 
(c8) are 0(#P), and the total complexity is 0(#0 * min(/i + a, g * a) + #P). 

5 Higher-Level Design and Analysis 

Avoiding Duplication of Code for Initialization. Algorithm (HU) duplicates the 
code in the loop body in the initialization. Cai and Paige 0 proposed a high- 
level transformation that can drastically simplify the initialization and do all 
the work in the loop body. By Theorem 5 in 0, the fixed-point expression (EJ 
is equivalent to 

LFPc.{}(PuF(Q)uQ, g) (27) 

which can be transformed into 

Q:={}; 

while exists p in P U F{Q) — Q (28) 

g with := p; 

This merges the initialization for Q .= P into the iteration and thus avoids 
code duplication. However, this merging reduces the accuracy of the complex- 
ity analysis. The complexity analysis is similar to that in Section 0 The total 
complexity is again 0{ffO * min(/i -|- a,g * a) -I- * g). We can not obtain 
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0{4t^0 * min(ft, + a, g * a) + #P) here, even if we have the additional assumption 
about the user query, because (c6) to (c8) are now from the main loop, where g 
is not bounded by a constant. 

We propose a general method that not only eliminates code duplication com- 
pletely but also yields overall even smaller code and more accurate complexity. 
The method is to merge into the main loop only the cases in the initialization 
that must be handled in the main loop, not the cases that are needed only in 
initialization. Our method is supported by the following theorem. 

Theorem 1. For all Pq C P, LFP c ,Po{{P ~ Po) U F{Q) U Q, Q) exists if and 
only if LFP g ,p{F{Q) U Q, Q) exists, and if they exist, they are equal. 

Proof -LFP c ,Po{{P- Po) U F{Q) U Q,Q) = LFP c q{Po U {P- Po) U F{Q) U Q,Q) 
= LFPc,o(PuF(g)ug,g) = LFPc,p(F(0)ug,Q). □ 

We apply TheoremC]with Po = {pin P \ poi [W, C,I,N]orp of [W, N, i?']}. 
The fixed-point expression is equivalent to LFP g ,P,{P-PoUF{Q)UQ, Q). 
Transforming this into a while-loop and applying finite differencing yields the 
following complete algorithm, which has the same iteration as in algorithm (II Yll 
and initializes O and Ell to {}, E21 through EFl for p in Pq as in (11 711 . and W 
to P — Pq: 

O, W, Ell, E21, E31, Ell := {}; 

for p in P 
case p of 

[N',C,I,N] : 

P21 with := [N,N']- 

E31 with := [[C, N], [N', /]]; (29) 

[iV',lV,i?'l : 

P41 with := [N, [A1',P']]; 

other : 

W with := p; 

same iteration as in algorithm IlYI 

The complexity analysis is the same as in Section EJ except that the corre- 
sponding (c6) to (c8) in (i) equal zero here, and (ii) here is bounded by the sum 
of (ii) and (iii) there. Thus, the total complexity is 

0{#0 * mm{h + a, g * a) + #P) (30) 

which is better than the complexity (TO obtained for dED. 

Flandling Multiple Queries. In the application, there can be many queries about 
a program. We can transform the above algorithm, so that initialization is done 
once in linear time in the size of the program, and simplification after each query 
takes time roughly linear in the number of live program points. In particular, 
initialization can be done concurrently with the construction of the productions. 

Let Pq be the set of productions constructed from the given program; it 
contains only productions of copy, selector, and conditional forms. Let Pi be the 
set of productions from a user query; they are all in good forms. Thus, based 
on Theorem n, initialization using Pq followed by simplification using P\ can be 
specified as 



LFPc.Po(PiuP(g)ug, g) 



(31) 
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which is transformed into 

Q '■= Po', 

while exists p in Pi U F{Q) — Q (32) 

Q with p; 



Applying finite differencing in a similar way as above yields an algorithm that 
takes 

0{#0 * mm{h + a, g * a)) (33) 

time for simplification after a query. 

An Optimization to Conditional Forms. For production p of form [N, i?] where 
R isR, we can add the following updates at the end of handling that form, so 
as to avoid unnecessarily enabling any conditional form more than once: 

g - := {[A',A,P'] in Q} 

P41 - := {[N, [A',P']] in P41} 

Then the assignment to Q will be deleted by dead-code elimination, and the 
assignment to E41 is simply E4:1{N} := {}. This optimization can be applied 
to all algorithms derived above. 

For complexity analysis, we only need to change formula (c4) to 
^ Af in dom O #A41{ A'} (c4’) 

Therefore, (c4’) < a*r. This does not change the overall asymptotic complexities. 

For handling multiple queries, since this optimization updates E41 in the 
iteration, we need to preserve E41 after the initialization. To do this, we simply 
use a new set E41' to function as EAl in the iteration: insert EAV := E41 
immediately before the iteration, which can be a pointer assignment, and in the 
iteration, replace all uses of E41 by E4V. This does not change the complexity. 

6 Lower-Level Implementation and Experiments 

We consider implementation of the two best algorithms, (I2SI) for one query and 
the algorithm obtained from for multiple queries. The same data structures 
for representing sets are suitable for both. All sets involved are clearly finite 
based on the analysis in Sections 0 and 0 

Low-Level Set Operations. All the sets constructed in our algorithms are in fact 
maps, i.e., sets of pairs. To make this explicit, we do the following three groups 
of replacements in order: 



1) while exists Z in M 
...Z... 


with 


while exists A in dom M 
while exists Y in M{X} 
...[X,Y]... 


2) M with := [A, Y] 


with 


M{X} with := Y 


M less := [A, F] 


with 


M{X} less := Y 


[A, Y] notin M 


with 


Y notin M{X} 


3) S with := A 


with 


if A notin S 
S with := A 
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The first two groups clearly treat the domain of a map M as a set and the 
image of M at each element AT as a set. The third guarantees that an addition 
is only for an element not located in the set; in general, similar replacements 
are done for deletions as well, but the only deletion in our algorithms is for an 
arbitrary element retrieved from the same set and thus already located in it. We 
do not need to transform for-loops in our algorithms, since they enumerate sets 
of tuples that are only read; we introduce pattern matching to make components 
of these tuples explicit, so other replacements apply in the loop body. 

After the replacements, all the set operations are restricted to those described 
in Section 0 with the above guarantees about elements added or deleted. To 
support the complexity analysis in Sections 0 and 0 each of these operations 
needs to be done in 0(1) time. 

Data Structure Selection. Consider using a singly linked list for each of the 
domain and image sets of O, W, and Ell to E41. Let each element in a domain 
linked list contain a pointer to its image linked list, i.e., represent a map as 
a linked list of linked lists. It is easy to see that all operations except indexed 
retrieval and associative access can be done in worst-case 0(1) time. The indexed 
retrievals are for tuples never updated and can be implemented using arrays. 
However, an associative access would take linear time if a linked list is naively 
traversed. A classical approach is to use hash tables instead of linked lists. This 
gives average, rather than worst-case, 0(1) time for each operation, and has an 
overhead of computing hashing related functions for each operation. 

Paige et al. describe a technique for designing linked structures that 

support associative access in worst-case 0(1) time with little space overhead. 
Consider 

for A in VL or while exists X in W 

...X in S... or ...X notin S... or ...M{X}... where the domain of M is S' 

We want to locate value A in S after it has been located in W. The idea is to 
use a finite universal set B, called a base, to store values for both W and S, so 
that retrieval from W also locates the value in S. B is represented as a set (this 
set is only conceptual) of records, with a K field storing the key (i.e., value). 
Set S is represented using a S field of B: records of B whose keys belong to S 
are connected by a linked list where the links are stored in the S field; records 
of B whose keys are not in S store a special value for undefined in the S field. 
Set W is represented as a separate linked list of pointers to records of B whose 
keys belong to W. Thus, an element of S is represented as a field in the record, 
and S is said to be strongly based on B] and element of W is represented as a 
pointer to the record, and W is said to be weakly based on B. This representation 
allows an arbitrary number of weakly based sets but only a constant number of 
strongly based sets. Essentially, base B provides a kind of indexing. 

Our while-loop retrieves elements from the domain of W and locates these 
elements in the domains of O and Ell to EAl. For example, at 0{A} in case 
4 in the main loop, nonterminal N needs to be located in the domain of O. We 
use a base B for the set of nonterminals. The domain of W is weakly based on 
B, and the domains of O and Ell to E41 are strongly based on B. The only 
exception is that the domain of E31 needs a two-element key of the form [C, A], 
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but in the application, each N has only one corresponding C, so we simply use 
N as the key and record the corresponding C in a separate field to be checked 
against. 

Our algorithms test whether a value is not in the images of O, W, and Ell 
to E41 at any element in their domains, so there are 0{n) sets that need to be 
strongly based, and thus the based-representation method does not apply here. 
We describe three representations for these images and discuss the trade-offs. 

Data Structure Choices and Trade-Offs. The images of O, W, and Ell to £141 
can be implemented using arrays, linked lists, hash tables, or a combination of 
linked lists and hash tables. 

First, for the 0{n) images of each of O, W, Ell to £141, we may make them 
strongly based using an array of fields. This includes making a base B2 for the set 
of good forms. Each membership test takes worst-case 0{1) time. However, this 
requires a total of quadratic space. Quadratic initialization time can be avoided 
using the technique in PJ Exercise 2.12]. 

Second, we may use a singly linked list for each of the images of O, W, 
and £111 to £141. Such a list is called unbased representation m if it is a list of 
elements rather than a list of pointers to the elements in some base. Due to other 
associative accesses in the main loop body, any mention of a nonterminal (in 
images of W, Ell, and £121, in domains of the images of £131, and in domains and 
images of the images of £141) should be implemented as a pointer to an element in 
base B. We also make a base B2 for the set of good forms (where nonterminals in 
the arguments of constructor forms are also implemented as pointers to elements 
in H), and represent any mention of a good form (in images of O and W and in 
images of the images of £141) as a pointer to an element in B2; use of B2 avoids 
an extra factor of a in the time complexity for comparing constructor forms 
if specialized constructor forms are not used. Linked-list representation incurs 
no asymptotic space overhead, but each membership test takes worst-case 0{l) 
time where I is the length of such a linked list. Based on parameters introduced 
in Section 0 we know that I = a for the images of £121, £131, and £141, I = h 
for the images of Ell, and I = g for the images of O. Also, each element in W 
either has a right side in a good form or is a copy form, and thus I = g f for 
the images of W, where / is the dual of h, i.e., it is the maximum number of 
nonterminals to the right of a nonterminal: 



/ = max JV in dom (inv Bii) #(inv £111){1V} (34) 

In the application, / is bounded by the maximum of g-|- 1, the number of live call 
sites of any function, and the number of live occurrences of any formal parameter 
or bound variable. For p|l and the algorithm obtained from jsa), the time for 
initialization is increased by a factor of a, and the time for the main loop is 
increased by a factor oi h g f. This representation works well if h, g, and 
/ are small. It works well for all our examples except a contrived worst-case 
example. 

Third, we may maintain a hash table for each of the image sets. This achieves 
the time complexities analyzed in Sections 0 and 0 but they become average- 
case, rather than worst-case, complexities. 
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Finally, we can use linked lists when the images are small, and use hash 
tables when the images are larger. This achieves the same complexities analyzed 
in Sections 0 and 0 also for average case. 

Experiments. We implemented the simplification algorithm obtained from (E2J 
with the optimization to conditional forms and used it to replace a previous 
algorithm in a prototype system for dead-code analysis and elimination 
The prototype system is implemented using the Synthesizer Generator 120 !, and 
the simplification algorithms are written in a dialect of Scheme. We have used 
the system to analyze dozens of examples. Table 0 reports measurements of the 
most relevant parameters — as defined in the complexity analysis in Section 0 
plus c4’ in Section0and / in Section 0 — and simplification times from analyzing 
14 programs with 25 different queries using the new simplification algorithm. 

Programs bigfun, minmax, and biggerfun are examples from m . worst, 
worstlO, and worst20 are examples contrived to demonstrate the worst-case 
cubic-time complexity, incsort and incout are incremental programs for selec- 
tion sort and outer product, respectively, derived using incrementalization 
where dead code after incrementalization is to be eliminated, cachebin and 
cachelcs are dynamic-programming programs for binomial coefficients and 
longest common subsequences, respectively, derived using cache-and-prune 
[2BI24j . where cached intermediate results that are not used are to be pruned, 
calend, symbdiff , takr, and boyer are taken from the Internet Scheme Repos- 
itory m calend is a collection of calendrical functions uni- takr is a 100- 
function version of TAK that tries to defeat cache memory effects, symbdiff 
does symbolic differentiation, boyer is a logic programming benchmark. 

The queries are in the form where N corresponds to the return value 

of a function in the second column of Table 0 In general, especially for libraries, 
such as the calend example, there may be multiple functions of interest; we 
included an example where we picked 22 functions at once. 

First of all, the analysis is effective, reflected in the resulting number of 
live program points r compared to the total number of program points n. For 
some examples, the program after dead-code elimination is even asymptotically 
faster I2ni’ We also observe: 1) ranges from 1.02n to 1.56n, 2) a is consistently 
very small, 3) h varies widely, 4) g and / are typically quite small, 5) is 
roughly linear in r and in g. Whether the observations about g and / hold for 
large programs need more experiments, but regardless, the measurements help 
confirm that the second way of estimating (cl) and (c5), not using h, better 
explains the running time in practice. 

The simplification time after initialization, in milliseconds, with and without 
garbage-collection time, is measured on a SUN station SPARC 20 with 60 MHz 
CPU and 256 MB main memory. The times in Table 0 are for when linked lists 
are used for images of O, W, and All to Eil. We also measured the times for 
hash tables and for linked lists combined with hash tables; both of these are 
slower. Optimization to conditional forms gives up to 15% speedup. 

We can see that the simplification time is very much linear in c=(cl)-|-(c2)-|- 
(c3)-|-(c4), that is, it is roughly linear in with a small factor from g, and 
thus, it is linear in r and quadratic in g. Being close to linear in r rather than 
n is important, especially for analyzing libraries. Again, experiments measuring 
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g for large programs are needed, but our measurements confirm the accurate 
complexities analyzed in terms of the identified parameters. 



Table 1. Measurements for Example Programs. 



program 

name 


user 

query 


#P 


#o 


■> 




a 


h 


g 


f 


cl,c5 


c2 


c3 


c4 


c4’ 


c 


simp. 

w/gc 


time 
no gc 


bigfun 


lenf 


48 


47 


36 


23 


2 


2 


3 


3 


40 


0 


4 


24 


14 


68 


.002 


.001 


minmax 


getlen 


112 


89 


81 


31 


3 


2 


5 


11 


76 


0 


8 


48 


23 


132 


.006 


.005 


minmax 


getmin 


112 


149 


81 


49 


3 


2 


8 


11 


129 


2 


38 


72 


33 


241 


.010 


.007 


biggerfun 


evef 


115 


114 


84 


64 


2 


2 


5 


10 


86 


2 


14 


64 


45 


166 


.008 


.007 


biggerfun 


oddf 


115 


115 


84 


56 


2 


2 


6 


6 


94 


2 


16 


60 


36 


172 


.008 


.007 


worst 


f 


28 


69 


24 


24 


2 


4 


4 


4 


64 


0 


0 


21 


12 


85 


.005 


.004 


worstlO 


f 


70 


419 


59 


59 


2 


11 


11 


11 


407 


0 


0 


133 


33 


540 


.028 


.018 


worst20 


f 


130 


1429 


109 


109 


2 


21 


21 


21 


1407 


0 


0 


463 


63 


1870 


.097 


.068 


incsort 


sort 


144 


132 


108 


49 


3 


2 


LI 


5 


139 


2 


20 


98 


29 


259 


.010 


.007 


incsort 


sort’ 


144 


33 


108 


24 


3 


2 


5 


5 


24 


6 


0 


15 


11 


45 


.002 


.001 


incout 


out 


152 


53 


117 


30 


5 


2 


4 


3 


43 


4 


0 


24 


18 


71 


.003 


.002 


incout 


out’ 


152 


77 


117 


55 


5 


2 


5 


4 


56 


8 


0 


48 


36 


112 


.005 


.004 


cachebin 


bin 


91 


113 


74 


67 


3 


4 


5 


5 


105 


0 


51 


65 


41 


221 


.009 


.006 


cachelcs 


Ics 


140 


205 


117 


89 


4 


6 


7 


5 


214 


0 


152 


104 


48 


470 


.018 


.014 


calend 


gregorian- 


1840 


228 


1551 


192 


5 


12 


4 


25 


178 


0 


66 


115 


111 


359 


.018 


.015 


calend 


Islamic- 


1840 


418 


1551 


346 


5 


12 


4 


25 


339 


4 


144 


199 


189 


686 


.034 


.024 


calend 


eastern- 


1840 


460 


1551 


375 


5 


24 


4 


25 


380 


4 


186 


207 


197 


777 


.038 


.030 


calend 


yahrzeit 


1840 


484 


1551 


428 


5 


11 


4 


25 


373 


0 


108 


293 


290 


774 


.038 


.030 


calend 


22 functions 


1861 


1604 


1551 


1352 


5 


37 


4 


25 


1329 


41 


614 


791 


777 


2775 


.13 


.10 


symbdiff 


deriv 


1974 


7636 


1264 


1221 


3 


65 


L3 


35 


11045 


28 


206 


3639 


855 


17918 


.59 


.48 


symbdiff 


derivations-x 


1974 


7784 


1264 


1261 


3 


65 


13 


65 


11214 


30 


206 


6686 


878 


18136 


.60 


.48 


takr 


tak99 


4005 


2800 


2804 


2800 


3 


4 




5 


3000 


0 


0 


2200 


2200 


5200 


.23 


.21 


takr 


run-takr 


4005 


2804 


2804 


2804 


3 


5 




5 


3004 


0 


0 


2203 


2203 


5207 


.23 


.21 


boyer 


setup 


4496 


4513 


4347 


3755 


3 


106 


8 


6 


1152 3496 


1316 


92 


31 


6056 


.29 


.23 


boyer 


setup,run-boyer 


4497 


39501 


4347 


4302 


3 


924 


25 


13 


83925 3684 38370 


1377 


254 127356 


4.9 


3.2 



gregorian-: gregorian—> absolute Islamic-: islamic-date eastern-: eastern-orthodox-christmas 



7 Related Work and Conclusion 



Regular tree grammar based constraints have been used for analyzing recursive 
data in other applications and go back at least to Reynolds and Schwartz HOI. 
Related work includes flow analysis for memory optimization by Jones and Much- 
nick m binding-time analysis for partial evaluation by Mogensen m set-based 
analysis of ML by Heintze type inference by Aiken et al. |2IJ| . backward 
slicing by Reps and Turnidge EH, and set-based analysis for debugging Scheme 
by Flanagan and Felleisen H3|. Some of these are general type inference and are 
only shown to be decidable 0 or take exponential time in the worst case 0. For 
others, either a cubic time complexity is given based on a simple worst-case anal- 
ysis of a relatively straightforward algorithm inm. or algorithm complexity is 
not discussed explicitly mm- 

Constraints have also been used for other analyses, in particular, analyses 
handling mainly higher-order functions or pointers. This includes higher-order 
binding-time analysis by Henglein m, Bondorf and Jprgensen 0, and Birkedal 
and Welinder points-to analysis by Steensgaard and control flow anal- 
ysis for special cases by Heintze and McAllester m- The last restricts type sizes 
and has a linear time complexity, and the others use union-find algorithms HOI 
and have an almost linear time complexity. These analyses either do not consider 
recursive data structures 1201441 . or use bounded domains Itil4ffill9l and are thus 
less precise than grammar constraints constructed based on uses of recursive 
data in their contexts. 

People study methods to speed up the cubic-time analysis algorithms. For 
example, Heintze m describes implementation techniques such as dependency 
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directed updating and special representations, which has the same idea as incre- 
mental update by finite differencing and efficient access by real-time simulation. 
Flanagan and Felleisen m study techniques for component-wise simplification. 
Fahndrich et al. El study a technique for eliminating cycles in the inclusion con- 
straint graphs. Su et al. El study techniques for reducing redundancies caused 
by transitivity in the constraint graphs. These improvements are all found to 
be very effective. Moreover, sometimes a careful implementation of a worst- 
case cubic-time j1 ( or quadratic-time El) analysis algorithm seems to give 
nearly linear behavior EEaisi. Our work in this paper is a start in the formal 
study of the reasons. 

Our analysis adds edges through selecting components of constructions and 
enabling conditions, and our application also has the cycle and redundancy 
problems caused by dynamic transitivity, as studied in EESI. However, our 
algorithm still proceeds in a linear fashion. That is, if we have constraints 
Ni^ N 2 , ■■■, Nk-i^ Nk, we do not add any edges Ni^ Nj for any i,j such 
that 1 < i < j < k; only when a new R is added, we add an Nk-i—^R 

if it is not already added and subsequently an Nk~2^R and so on. This for- 
malizes Heintze’s algorithm m- For comparison, a future work would be to 
formalize the algorithms in mm . It will also be interesting to formalize and 
compare with m As our problem is related to computing Datalog queries, it 
will be worthwhile to see to what degree Me Allester’s complexity results for 
Datalog queries |^ could be applied; note, however, that those results are ob- 
tained based on extensive hashing and thus are for average cases, not worst 
cases. Compared with the magic-sets transformation m, finite differencing or 
incrementalization m based methods derive more specialized algorithms and 
data structures, yielding more efficient programs, often asymptotically better. 

To summarize, for the problem of dead-code elimination on recursive data, 
this paper shows that formal specification, design, and analysis lead to an ef- 
ficient algorithm with exact complexity factors. Clearly, there is a large body 
of work on all kinds of program analysis algorithms IHDI, from type inference 
algorithms, e.g., El, to efficient fixed-point computation, e.g., H2|. Precise and 
unified specification, design, and complexity analysis of all kinds of program 
analysis algorithms deserve much further study. We believe that such study can 
benefit greatly from the approach of Paige et al. |34l,'-i2liSI3,'-il7] , as illustrated in 
this work, and from the more formal characterization by Goyal El- 
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A An Example Program 

Program. A program is a set of recursive function definitions, together with a set of 
constructor definitions, each with the corresponding tester and selectors. 

f{x) = if null^x) then nil else cons{g{car{x')') , f {cdr{x)))-, 

A 

g{x) = x*x*x*x*x] 

A 

Jen(ic) = if null{x) then 0 else 1 + len{cdr{x))-, 

A 

lenf{x) = len(f{x))i 

cons : cons?(car, cdr); 

nil : nullQ; 



Labeled Program. The program is labeled, with a distinct nonterminal associated with 
each program point, as follows: 

f(^3e-x) =^35-if ^34-nv.U(^33-x)then^32'.nilelse^31-conB(^30-g(^29-car(^2S-x)),^27'. f(!^26-cdr(^25-x)))-, 

g(^24'-x) = !^23-^22'.!^21'.!^20-^19 -x b^IS- x b^IT'x t^l6 - x x; 

len(,^lix) =^13-,t ^12'.nuU{^ll'-x) then else +^7'- len(^6'-cdr(^5 'x)) ; 

lenf(^4:x) = ^3den(^2:f(rfi:x)); 

Constructed Grammar. The grammar constructed from the given program is 

^36^ ^33' 1^28’ ^36^ ^2b' ^33^ [n94]cons(NQ . Nq), ^ 33 —* [Ar 34 ]nU(), ^ 34 ^ [^ 35 ]!., ^ 33 ^ V 35 , 

^28^ [.V29]cons(lV29, Nq). M 29 — * [^ 30 )^ 24 , V 23 — * V 30 , M 30 — * car(AT34), 

V25^ [Al26]'^hns(lVo, IV 26 ), ^26—* [V27J1V36, ^ 35 —* IV 27 , M 27 — * ^ 31 ^ .V 35 , 



1V24^ IV49, V24— * AI18, .V24— * V47, 1V24^ Vxg, 1V24^ ^45, M19— * [lV2o]i:-, » [^20]^-, [N2i]L, 

Ni 7 ^ [N 24 ]L. N 21 ^ [N 22 ]r. [^22]^, AI22— * [^23]^’ .^45—* [1V23]!., 

V44- iV44, N44^ ATg, AT44^ [iV42lcons( No , ATp) , N44- [N42]n.i(), ^43^ [^43]!,, ^49^ N43, 

Ng- [Ng]!,, Ng^ [Ns]eons(No, Ne). -^6 - [^71^44, ^43^ [Ng]L, Ng^ N43, 

N4 ^ N4 , N4 ^ [N2I «36 . «35 ^ «2 . V2 ^ [N3] N44 , N43 ^ N3 , No ^ n 

User Query. A user query is 

Ni->L 

Simplification Result. The output of simplification, sorted by nonterminal number, 
is 



A’ae— ‘ nilQ, 


, cons(NQ, 


No) 


. Nge^ 


cons{NQ , 


N26). 


— ' niZ(), Nff — ► cotis(Aq, Aq), 




^35^ nilQ, 


1 A‘35^ cons(NQ, 


No) 


. N 35 ^ 


cons{NQ , 


Ng), 


Nio^L, 




A 34 ^ L, 












Nq^ L, 




iV33-* nii() 


, AT33^ cons(NQ, 


No) 








Ag- L, 




iV32-*- nii() 


, AT32— *• cons(NQ, 


No) 


. N 32 -> 


cons(NQ , 


Ng), 


Nj^ L, 




iVsi-* niZ() 


, — ► cons(NQ, 


No) 


. N 34 ^ 


cons{NQ , 


Ng). 


Nq^ nilO, NQ^cons(No, Aq). Nq^ 


cons(AQ, Aq), 


^ 27 —*- niZ() 


, N2J — ► cotis(Aq, 


No) 


, N 27 -> 


cons(NQ , 


Ng). 


A5— » cons(NQ , Aq), 




■^ 26 —* nilO 


, A‘ 26 — * cons(NQ, 


No) 


, N20-> 


cons(NQ , 


Ngg). 


A4^ nil(), N4^ cons(NQ, Aq), A4^ 


cons(AQ, A2 q), 


A25 — ‘ consl 


^-^0’ -^ 26 )’ 










A3^ L, 




N44-nil(). 


, — ► cotis(Aq, 


No) 


. N 44 ^ 


cons(NQ , 


Ng). 


A2— niZ(), A2^cons(Ao, Aq), A2 — 


cons(AQ, Aq), 


V 43 -i-. 












Aj^ — ^ nil{) , A]^ — > cons (Aq , Aq ) , Aj^ — ► 


cons(AQ, A2 q), 



N42— r, Nq^d 



Nonterminals N 15 to Y 24 and N 2 S to N 30 do not have a right-side good form. The 
corresponding program points are dead. 
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Abstract. Computers manipulate approximations of real numbers, 
called floating-point numbers. The calculations they make are accurate 
enough for most applications. Unfortunately, in some (catastrophic) sit- 
uations, the floating-point operations lose so much precision that they 
quickly become irrelevant. In this article, we review some of the prob- 
lems one can encounter, focussing on the IEEE754-1985 norm. We give 
a (sketch of a) semantics of its basic operations then abstract them (in 
the sense of abstract interpretation) to extract information about the 
possible loss of precision. The expected application is abstract debug- 
ging of software ranging from simple on-board systems (which use more 
and more on-the-shelf micro-processors with floating-point units) to sci- 
entific codes. The abstract analysis is demonstrated on simple examples 
and compared with related work. 



1 Introduction 

Everybody knows that computers calculate numerical results which are mostly 
wrong, yet they are intensively used for simulating highly complex physical pro- 
cesses and for predicting their behavior. Transcendental numbers (like tt and e) 
cannot be represented exactly in a computer, since machines only use finite im- 
plementations of numbers (floating-point numbers instead of mathematical real 
numbers); they are truncated to a given number of decimals. Less known is that 
the usual algebraic laws (associativity for instance) that we use when thinking 
about numbers are no longer true in general when it comes to manipulating 
floating-point numbers. 

It is actually surprising that very few studies on static analysis of floating- 
point operations or on their semantic foundations have been carried out. Our 
point of view in this article is that there are “numerical bugs” that a program- 
mer can encounter, and that some are amenable to automatic detection using 
static analysis of the source code, using abstract interpretation. This new sort of 
bug includes what is normally called bug, i.e. run-time errors (here for instance, 
uncaught numerical exceptions), but also more subtle ones about the relevance 
of the numerical calculations that are made. We advocate that it is as much 

* This work was supported by the RTD project IST-1999-20527 “DAEDALUS”. This 
paper follows a seminar given at Ecole Normale Superieure in June 1998. 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 2.14- E^ 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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of a bug to terminate on a “segmentation fault” as to terminate with a com- 
pletely meaningless numerical result (which might be used to control a physical 
apparatus with catastrophic consequences). 

This problem is not very well-known to programmers of non-scientific codes. 
Let us just give one example showing this is also of importance for the non- 
scientific computing world. On the 25th of February 1991, during the Gulf war, 
a Patriot anti-missile missed a Scud in Dharan which in turn crashed onto an 
American barracks, killing 28 soldiers. The official enquiry report (GAO/IMTEC- 
92-26) attributed this to a fairly simple “numerical bug” . An internal clock that 
delivers a tick every tenth of a second controlled the missile. Internal time was 
converted in seconds by multiplying the number of ticks by in a 24 bits 
register. But jh = 0.00011001100110011001100 •• • in binary format, i.e. is not 
represented in an exact manner in memory. This produced a truncating error of 
about 0.000000095 (decimal), which made the internal computed time drift with 
respect to ground systems. The battery was in operation for about 100 hours 
which made the drift of about 0.34 seconds. A Scud flies at about 1676m/s, so 
the clock error corresponded to a localization error of about 500 meters. The 
proximity sensors supposed to trigger the explosion of the anti-missile could not 
find the Scud and therefore the Scud fell and hit the ground, exploding onto the 
barracks. 

Actually, more and more critical or on-board systems are using on-the-shelf 
floating-point units which used not to be approved beforehand. Therefore we 
believe that static analysis of floating-point operations is going to be very im- 
portant in the near future, for safety-critical software as well as for numerical 
applications in the large. 

These kinds of problems are better-known in scientific computing, at least 
when modeling the physical phenomena to be simulated. What we mean is that 
in many cases, the discretizations of the (continuous) problems that are modeled 
are sufficiently stable so that little truncation errors do not overly affect the re- 
sult of their simulation. Unfortunately, it is difficult to find the exact semantics of 
floating-point operations, and even using some well-behaved numerical schemes, 
some unpredictable numerical errors can show up. Also some problems are in- 
herently ill-conditioned, meaning that their sensitivity to numerical errors are 
very high. In this latter case it is in general very difficult to assess the relevance 
of the numerical simulation even by hand. 



Organization of the Paper. In Sect. El we will explain what model of floating- 
point arithmetic we want to analyze (IEEE754-1985). We carry on in Sect. f2. 1 I bv 
explaining what kind of properties we want to synthesize by the analysis. Then 
in Sect. we give the syntax and informal meaning of a simple imperative toy 
language manipulating floating-point numbers; we give a first sketch of a formal 
semantics in Sect. 12.31 1 that we refine in Sect. I3.2ll . 

In Sect. 01 we present a few abstract domains that are candidate for the 
abstract interpretation of the concrete semantics. We give an example of abstract 
analysis in Sect. 0 We give some directions for improvement in Sect. 15. II and 
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compare with existing related work in Sect. |H1 We conclude by giving some future 
directions of work in Sect. 0 

2 The IEEE 754 Norm 

The IEEE754-1985 norm specifies how real numbers are represented in memor20 
using floating-point numbers, see |Col91IKah96j . The norm itself relies on a 
simple observation: 

Lemma 1. 

{-1,1} X [0,1[ X 

/, k)^s{l + f)2^ 

is a bijection with inverse: 

M* ^ {-1,1} X [0,1[ X IN 
g-.x ^ (s(a;), /(x), k{x)) 

with s(a:) being the sign of x, k{x) = [^ 052(1 x |)J where [mJ denotes the integral 
porjl of u and log 2 is the logarithm in base 2, and f{x) = — 1. 

Taking a representation with a fixed number of bits K for exponents (function 
k{x)) and a fixed number of bits N for the mantissa (function f{x) or m{x) = 
1 -|- f{x)), the norm defines several kinds of floating-point numbers. 




Fig. 1. Representation of a floating-point number in memory. 



— The standard numbers, r = with s G {—1, 1}, 1 — 2'^ < k < 2^, 

0 < n < 2^ normalized so that, r = s * 2^(1 -|- /) with / < 1, 

— Denormalized numbers (to manage “underflow” in a gradual manner), r = 
s * n * 2^+^“^ = s * 2^(0 -I- /) with k = 2 — 2^ and 0 < n < 2^“^ i.e. 
0</< 1, 

— - 1-00 and —00 (notice that their inverses, -1-0 et —0 are also there), 

^ But not in the registers of micro-processors. 

^ i.e. the greatest integer less or equal than x. We will also use \u\ which is the least 
integer greater or equal than x. 
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— NaN “Not a Number” signed or not (which are the results of dubious opera- 
tions such as 0 * oo). 

Normalized numbers come in several versions, according to different choices 
of K and N, so allowing more or less precision at will. Simple precision (REAL*4, 
float) has K = 7 and N = 24, double precision (REAL*8, double) has K = 10, 
N = 53, and double extended (REAL* 10 etc., long double) has K > 14, N > 64. 

Just to give an order of magnitude of the numbers we are talking about, let 
us show a few examples. For a simple float, the maximum normalized number 
is 3.40282347 * 10^®, the minimum positive normalized number is 1.17549435 * 
10“®®, the maximum denormalized number is 1.17549421* 10“®® minimum pos- 
itive denormalized number is 1.40129846 * 10“"*^®. Around 1, the maximal error 
(“unit in the last place”, or ulp, or ulp{l)) is 2“^® for a simple float, i.e. about 
1.19200928955* 10“^. 

The norm also specifies some properties of some of the computations we can 
make on floating-point numbers. For instance, the norm specifies that -I-, — , *, /, 
are computed with an inaccuracy that cannot go beyond the ulp around the 
exact result (if there is no “overflow” ) . 

The norm allows the user to use different round-off methods. One can use 
round-off towards zero, round-off towards the nearest, round-off towards plus 
infinity, and round-off towards minus infinity. A more subtle rule is that when 
we have the choice between two roundings (in the round-off towards the nearest 
mode), we choose the even mantissa. In fact, the norm even specified that x.y 
(where . is one of the floating-point operations -|-, -, *, /, ^ on floating-point 
numbers x and y) is the rounding (in the corresponding rounding mode) of xoy 
(where o is the corresponding operation in IR). 

The conversions are to be given an explicit semantics as well. More annoying 
is that we should take care of the order of evaluation (in conflict with compiler 
optimizations!), since the round-offs destroy associativity in general. 

Caveats. As we said in the beginning of this section, the norm specifies what 
happens in memory but not in processor registers. There are conversions be- 
tween memory and registers that we have to know about. In general, (except 
M680x0 and 1x86/1x87 where all operations are computed in double extended 
before round-off), registers are like main memory. There can be some differences 
with RISC processors as well, like the IBM Power PC or Apple Power Mac- 
intosh, because of the use of compound instructions (multiply-add etc.) which 
do not use the same round-off methods. Most of the machines follow the norm 
anyway but not all the compilers in particular concerning the way they handle 
(or do not handle!) arithmetic exceptions (underflow etc.). CRAY used to have 
a different arithmetic, which is a problem for actual applicability of our methods 
for scientific computing. Hopefully, it seems that it is now converging towards 
the norm. We have seen cases in which porting a scientific code to a computer 
with a different arithmetic produces dramatic changes. 

® This is done using extra “guard digits” for computation by the processor of the 
operations. 
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Another problem is to know how to deal with the other mathematical opera- 
tions (like the ones in <math.h> in C). In general we have to know the algorithm 
or its specifications (sometimes given by library providers). The problem of hav- 
ing “good” libraries of transcendantal functions is well-known in the literature, 
as the “Table makers’ dilemna” II AIT'i.sl . In this article we will stick to the core 
of the norm, and consider only “simple” operations. 



2.1 Examples and Properties of Interest 



Our aim is to be able to analyze at compile-time the way floating-point opera- 
tions are used or mis-used. 

What we intend to automatically find is at least the exceptions that might be 
raised (and not caught), like “Overflow”, “Underflow” and “NaN”. This could 
be handled with other well-known analyses (interval analysis as used in Syntox 
, polyhedra iTTHTI^ etc.) so we will not describe this part so much. What 
we really would like to find is some not too pessimistic information about the 
precision of the values of the variables. This leads to estimates of branching 
reliability in tests and in expecting to partially solve some difficult termination 
problems (see Example [Q . 



Example 1. Consider the expression x = which leads on an UltraSparc 

in simple precision, for C\ = 0, C 2 = 1, 6 i = —46099201, 62 = —35738642, 
oi = 37639840 and 02 = 29180479, to a; = 1046769994 (the true result is a; = 
—46099201). This is an example of a problem known as “cancellation”. The 
control flow might be wrong after this instruction, if it were followed by the 
(somewhat unlikely!) instructions: 



if (x==-46099201) { . . . } 
else { . . . } 

or non-termination could happen since this could be the termination test of a 
loop. 



Here are some simple (and classic) examples of stable and unstable numerical 
computations: 



Example 2. Consider the following two implementations of the computation of 
the nth power of the gold number {g = ^~^ ). The first one on the left (program 
(A)) relies on the simple property that if Un is the nth power of the gold number, 
Un +2 = Un — Un+i- The second one, on the right hand side (program (B)), is the 
brute force approach. 
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mainO 

{float x,y,z; 
int i ; 
x=l; 

y=(sqrt(5)-l)/2; 
for (i=l ; i<=20 ; i++) { 
z=x; 
x=y; 
y=z-y; 

printf ("phi~yod=7of \n" , i ,x) ; }} 
Program (A) gives the following results: 

phi-l=0. 618034 
phi-2=0. 381966 
phi ~3=0. 236068 
phi~4=0. 145898 
phi~5=0. 090170 
phi~6=0. 055728 
phi -7=0. 034442 
phi-8=0. 021286 
phi-9=0. 013156 
phi-10=0. 008130 



mainO 
{float t; 
int i ; 
t=l; 

for (i=l ; i<=20 ; i++) { 
t=t* (sqrt (5)-l)/2; 
printf ("phi -7,d=7.f\n" ,i,t) ;}} 



phi-ll=0. 005026 
phi-12=0. 003103 
phi-13=0. 001923 
phi-14=0. 001180 
phi-15=0. 000743 
phi-16=0. 000437 
phi-17=0. 000306 
phi-18=0. 000131 
phi-19=0. 000176 
phi-20=-0. 000045 



Which of course does not make much sense! The fact is that the numerical scheme 
used on program (A) is not well-conditioned, meaning that it is very sensitive 
to the initial inaccuracy. In fact the initial inaccuracy on the computation of 
(\/(5) — 1) /2 which is of the order of ulp{l) at most, is increased at each iteration 
and becomes more important than the real result. 

Program (B) leads to the following results. 



phi-l=0. 618034 
phi-2=0. 381966 
phi-3=0. 236068 
phi-4=0. 145898 
phi-5=0. 090170 
phi-6=0. 055728 
phi -7=0. 034442 
phi-8=0. 021286 
phi-9=0. 013156 
phi-10=0. 008131 



phi-ll=0. 005025 
phi-12=0. 003106 
phi-13=0. 001919 
phi-14=0. 001186 
phi-15=0. 000733 
phi-16=0. 000453 
phi-17=0. 000280 
phi-18=0. 000173 
phi-19=0. 000107 
phi-20=0. 000066 



Which is in fact completely acceptable. Take now program (C) below which looks 
like program (A) (at least it does not look simpler): 



x=l; 

y=-l. 0/3.0; 

for (i=l ; i<=20 ; i++) { 



z=x; 

x=y; 
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y=(x+z)/6; } 



The results that are computed are accurate (they are roundings of (—5)"): 



phi ~l=-0. 333333 
phi-2=0.111111 
phi~3=-0. 037037 
phi~4=0. 012346 
phi~5=-0. 004115 
phi~6=0. 001372 
phi~7=-0. 000457 
phi~8=0. 000152 
phi~9=-0. 000051 
phi~10=0. 000017 



phi-ll=-0. 000006 
phi~12=0. 000002 
phi~13=-0. 000001 
phi~14=0. 000000 
phi~15=-0. 000000 
phi~16=0. 000000 
phi~17=-0. 000000 
phi~18=0. 000000 
phi~19=-0. 000000 
phi~20=0. 000000 



This “numerical scheme” is well-conditioned, i.e. stable. 



2.2 A Language 

In the language we consider in this paper, we confine ourselves to simple floating- 
point operations (which are fully specified in the IEEE754-1985 norm), with one 
type of floating-point number only (no double precision nor cast here), 

Expr = cste constant real expression 

X variable A G Var 

Expr -|- Expr sum 
Expr * Expr product 
Expr — Expr difference 
Expr/Expr division 
-^Expr square root 
(Expr) bracketing 

The idea is that the evaluation of arithmetic expressions is determined by 
the syntax (left to right, innermost to outermost evaluation here). We confine 
ourselves in this paper to very simple test expressions, as follows, 

test = X == 0 zero 

A > 0 strict positivity 
A > 0 positivity 

Instructions are, 

Instr = A = Expr assignment for A G Var 

if test then block else block conditional statement 
while test block while loop 

We have used in the examples “equivalent” C forms of a program in that 
syntax. Blocks of instructions are concatenations of instructions, 

block = 0 empty block 

Instr; block block concatenation 

Finally a program P is just a block. 
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2.3 A (Almost) Standard Concrete Semantics 

We plunge floating-point numbers (parameterized here by N and K, the length 
of the binary words representing respectively the mantissa and the exponent) 
into V al which is the union of the (mathematical) real numbers IR extended 
with values {oo, — oo, NaN, w, v, S, cr} which stand respectively for -l-oo and — oo 
(mathematical infinities, coming from a compactification of the set of reals for 
instance), NaN, a special element denoting the value “not a number”, and uj 
denoting overflow, v, underflow, S, division by zero error and a is the error re- 
sulting from taking the square root of a strictly negative number. The semantics 
is given as a transition system, where states are elements of Ctrl x Env where 
Env = V ar —>■ V al and Ctrl is the text of the program yet to be executed. 
The semantics also depends on the round-off mode Ad : Val ^ Val (a partial 
functioiQ) and on the use (or not) of some standard handlers in case of overflow, 
taken care of by a (partial) function £ : V al Val. By convention, all our 

(partial) functions (if not otherwise stated) will not be defined on “errors” w, v, 
S and cr, nor on NaN and will be the identity on oo and — oo. For the sake of 
simplicity, we will consider only normalized floating-point numbers and will not 
use signed NaN nor signed zero. We re-define now the following mathematical 
functions acting on Val, 

— We “overload” the exponent function we had at Lemma E\k:Val^ Val 

is the exponent (partial) function with, fc(oo) = k{—oo) = oo, k(x) = 
max{[log 2 {\ x |)J,2 — 2^) if x G IR, a: 0, k{0) = 0, k{x) = T (i.e. 

not defined) in all other cases. This enables us to have the right / (as in 
lemma 0 function and thus the right underflow mechanism. 

— M{x) = s (this is the rounding towards zero mode, 

which we write when there is a risk of ambiguity Mo), other modes include: 
M{x) = s -I- (rounding towards plus infinity or M+), 

and M{x) = s (rounding towards minus infinity or 

M-), 

— £{x) = w if I X |> 2^^+^ — 2‘^^~^, £(x) = u if I X |< 2^“^^ (so that we 
are not dealing here with “gradual underflow” or denormalized numbers), 
otherwise £(x) = x (this is the “no handler” option) . 

We look at the semantics of an expression Expr. Given p G Env, 

|cste]-^p =£oAd(cste) 

iXjfp = p{X) 

|Expri -b Expr 2 Kp = |Expri]-^p +k |Expr 2 Kp 
|Expri *Expr 2 ]-^p = |Expri]-^p |Expr 2 ]-^p 
|Expri/Expr2Kp = |Expri]-^p/-^|Expr2Kp 
|Expri - Expr 2 Kp = |Exprijj-^p -f [Expr 2 ]^ p 

IVExprJI^p = a/ [E xpr] 



^ We write in an equivalent manner M(x) — T and M{x) undefined. 
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where the functions +-^, and are defined as follows, 

— a b = £ o M{a + b) 

— a*^ b = £ o M.{ab) 

— a b — £ o Ai(a — b) if a and b are not both the same infinity. In the latter 
case, a—^b = NaN. 

— a/fb = £ o A4(f ) if 6 0. If 6 = 0 then a/^b = S. 

f f 

— i/a = £ o M{^/a) if a > 0 otherwise ^/a = a. 

Assignments have the following semantics: |A = ExprJ^p = p[X ^ |Expr]-^p] 
where <— f] denotes the new environment in which p(u) is now equal to v, 
whereas all other variables are mapped to the values they had by p. Tests are 
also quite straightforward (|test]y is a boolean value indicating whether the 
test is true or not). Transitions from state (instr; Prog, p) to (Prog,p') are 
now rather easy to write down, given the evaluation of expressions above. We 
spare the reader the details, given that this is rather standard (in SOS style 
Fi^ for instance). In order to be able to write the abstract semantics in an 
easier manner in the sequel, we suppose that all expressions are decomposed into 
sequences of single operations (like +, — etc. respecting the evaluation strategy). 
For instance, the assignment x=y*z+2 will be supposed to be decomposed us- 
ing an auxiliary variable t as t=y*z; x=t+2. This refines the transition system 
described above by splitting the transition representing the evaluation of a (com- 
plex) expression into a sequence of transitions, one for each simple floating-point 
operation. 

Notations. In the sequel, operations etc. (respectively -I-, — etc.) will 

have to be understood as the floating-point (respectively “real”) operations. We 
will also introduce new operations -|-^, — ^ etc. and 0, 0 etc. (next section) and 
_l_a, _a “abstractions” of these operations. 

3 Abstract Domains 

A correct (in the sense of abstract interpretation) domain for abstracting the 
semantics above is given by intervals of floating-point numbers (in the style of 
F. Bourdoncle’s Syntox integer interval analyzer ISDl). Basically the “best” 
correct abstract operations (forward semantics) are: 

— the abstraction of the 0 operation is [a,6]0[c,d] = c),M'_^_{b+^ d)] 

~ for subtraction: [a, b] 0 [c, d] = d),M'^{b —f c)] 

— for multiplication: [a, b] 0 [c, d] = [M'_{min{a c, a*-^ d,b c, b d)), 

c, a*^ d,b c, b d))] 

— for the inverse (here d > c > 0): inv°{[c,d\) = ^ d) , ^ c)] 

— for the square root: y^[c, d]° = [Af'_(Vc'^), Ad(i_(-\/d'^)] (when c, d > 0) 

— For tests, one should be cautious with the rule for strict positivity: \X > 

01p=(p(A)>22-2"). 
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where A4' is the rounding function on the analyzer’s internal representation 
of floating-point numbers, +-^, etc. are the floating-point operations on the 
target architecture where the program which is statically analyzed should be 
run, and K is the corresponding number of bits used for representing exponents. 
In general, we can (should'O hope for the analyzer to have a better precision 
than the target architecture^ and in that case we can simplify the rules above 
(forgetting about the Ad' in the right-hand side of the definitions). This semantics 
has been implemented as part of the abstract domains used in the static analyzer 
TWO (ESPRIT project 28940), but is obviously very unsuitable for having a 
precise information on floating-point operations. Actually, we used an even less 
precise semantics in that we supposed we did not know the rounding mode in 
the analyzed program, so we had to assume the worst case which is, for instance 
in the case of the abstraction of addition: [a, b] 0 [c, d\ = [M-{a + b),M+{c+d)] 
where Ad is the rounding function corresponding to the target architecture (or 
a suitable approximation of) . 

The experiments (described in |nOP~*~nTj ) show that this kind of analysis 
behaves poorly on floating-point code. The figure of about ten percent of the 
lines (which use floating-point operations) of a code being signaled as potential 
run-time errors (over-pessimistic warnings about the possibility of getting to an 
erroneous state, like overflow, division by zero etc.) is not uncommon. Using 
this semantics though, we are able to find real “subtle” bugs such as for the 
program: if (x>0) y=l/x*x, where there might be a division by zero erroJl 
(for instance when x = 2^“^ ). Also this abstract semantics is sufficient to get 
good estimates of the 20th iteration of program (B) of example El It is of order 
[po -2-23,po + 2-23]20 i.e. about [6.61067063328* 10-^ 6.61072163724 * lO"®]. 
Also the numerical bug of the Patriot as explained in the introduction would 
certainly have been found by such interval analyzers, with correct floating-point 
semantics. 

But interval semantics is always very conservative and pessimistic: it might 
even incorporate the error of computation of the analyzer itself (Ad')! Secondly, 
it aggregates in the abstract value both the magnitude of the expected result 
and the inaccuracy error. Also it does not take care of dependencies between 
the values and especially between the errors. For instance, x — x will always 
lead in such abstractions to a strictly positive error except if a; is a singleton 
interval (i.e. a constant). What we really need is a relational abstraction at least 
on inaccuracy values. 



3.1 Domain of AfRne Forms 

The idea here is to trace instructions (or locations in the program) that cre- 
ate round-off errors. We associate with each location and variable the way this 
control point makes the variable lose precision. This is loosely based on ideas 

® One could actually use multi-precision numbers instead of IEEE 754 double or 
extended double types for representing intervals in the static analyzer. 

® This is an example taken from a seminar by Alain Deutsch in 1998. 
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from affine arithmetic |VA( (used in simulation of programs, not in static 
analysis). 

The abstract values (notwithstanding error values) are, x = oq + oiCi + 
• • • + a„e„, the are variables, intended to represent random values with range 
] — uZp(l), uZp(l)P, associated with each location (describing the loss of preci- 
sion at that point), the ai being in an abstract domain A (for example real or 
floating-point intervals) abstracting p(M U {oo,— oo}), through (for instance) a 

a 

Galois Connection |CC92aj p(IRU{oo,— oo}) ^ A. Basically oq should be 



7 

an abstraction of the intended result if the program was manipulating real num- 
bers, and the Ui (i > 1) represent abstractions of each small error due to the 
“ith operation” in the program. 

Let us make this more precise by setting first £, the set of all locations in 
the programs to be analyzed (i.e. all elements in Ctrl in the concrete semantics 
given in Sect. El £= {ci \ i G L} {L is the set of indices used to describe 
elements of £) that we will identify in the sequel with a subset of IN. The affine 
forms domain, parameterized by A and £ is the domain T> defined as follows, 
T> = {oo+X^iGL I ®0) ai G A,i & L} and the order is defined component- wise, 
“0 + Sjgl < “0 + Sjgl “0 “o and a* a' for all i G L. 

Therefore, if ^ is a lattice, then 27 is a lattice with component-wise opera- 
tions. Similarly, widenings and narrowings defined in A can be extended 

in a component- wise manner to generate widenings and narrowings on T>. In gen- 
eral the classical widenings on intervals are not very subtle. We define the follow- 
ing family of widenings as: [a, 6] Vfe [c, d] = [e, /] with | ® ^ ^ "^itherwise 

f = d + 2’^{d-b) iid>b , 

„ , , . . i his IS only a widening it we suppose that 

j = a otherwise 

the boundaries of our intervals use a finite precision arithmetic (bounded multi- 

precision for instance). 

We can now define a concretization function F : T> ^ p(lR U {oo, — oo}) by 
r{ao + = 7(ao) + Ejgl 7(ai)*] ~ ^^^(l), ulp{l)[. 

The problem is that there is no way we can hope for a very strong correctness 
condition (we will give it in detail in Sect. E3) for an analysis based on T> with 
respect to the concrete semantics given in Sect. O because we have not specified 
in the concrete semantics what the “real” result should be. Therefore there is 
no best choice to what oq should be (hence the same problem holds for the Oi, 
i G L)0. We should in fact have designed a non-standard concrete semantics that 
remembers the inaccuracy of the computations, which we shall see now. 



and 



^ Notice that if we assume the default rounding mode, we could actually use a smaller 
interval i.e. [—ulp{l)/2,ulp{l)/2], 

® Mathematically, suppose we have a corresponding abstraction A : piVal) T> 
making {A, F) into a Galois connection. Suppose for instance that A is the interval 
domain, and consider T(ui = [0, 0] -I- [1, l]ei) =] — ulp{l),ulp{l)[ and F{u 2 = [0, 0] -|- 
[1, l]e 2 ) =] — ulp{l) , ulp{l)[ as well. But F(ui n U 2 ) = T([0, 0]) = [0,0] is not equal 
to F{ui) n F(u 2 ) =] — ulp(l),ulp{l)[, so there cannot be a left-adjoint to F. 
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3.2 A Non-standard Semantics 

We slightly change the semantics of Sect. IQ so that environments are now of 
the form p : Var — > {V al x V al) (we will write p = {p,p))- So p{X) = (A, A) 
where X is any variable, and the intended meaning is that X_ is the semantics 
we had in Sect. ESI and X is the intended “real” computation (i.e. using real 
numbers and not floating point numbers). For the sake of simplicity, we only 
carry on the concrete and abstract semantics without dealing with NaN nor 
run-time errors, hence dropping the £ part of the semantics. For expressions for 
instance, we find the new concrete operators: 

— a 6 = (A4(a + b),a + b) 

— b = (Ad(a6), o6) 

— a b = (A4(a — b),a — b) 

— aA=(Af(f),f) if 6^0. 

The rest of the semantics is pretty much the same, “executing in parallel” 
the program with floating-point operations, and the program with operations 
in IR, but without observing precisely the steps in the “real” computation. For 
instance, we have a transition from (x = Expr; Prgm, (p,p)) to {Prgm,(lx = 
Expr]/p, I X = Exprjp)) where |.]-^ is the “floating-point” semantics given in 
Sect. 1^31 and |.] is a similar semantics, but with operations and numbers in IR 
(the “ideal semantics”). The more difficult part of the semantics is tests (also 
while loops of course since they include tests). The problem is that a test in the 
floating-point semantics might well not give the same result as the test in the 
real number semantics (as in example^), leading to a different flow of execution 
in the two semantics. We choose in that case to stop computing the real number 
semantics: for instance there is a transition from (if (x < 0) a; = a: -I- 1 else x = 
X — l,{x <— — ^ 0)) to (x = X -I- 1, (x <— — 10“^^, T)) and then to 
(0,(x ^ 1,T)B This actually corresponds to a synchronized product 
(with synchronization between two transitions being only allowed when the two 
have the same labels) of the transition system corresponding to the floating- 
point semantics with another, corresponding to an “observer”, which is the real 
number semantics. 

From this semantics, we can construct an even more detailed semantics which 
will be our final non-standard semantics, and which goes (briefly) as follows. We 
define inductively the notion of “inaccuracy” coming from a location (i.e. a 
transition) that we identify with a “formal” variable e^. Consider a trace s of 
execution from an initial environment (p, p) . Suppose this trace goes through 
locations ei to e^_i and that variables x and y are computed on this trace; we 
suppose we can write formally (this is the induction step of the definition) x = 
xo + X)i=i y ~ yo + X)i=i where is a formal variable of magnitude 

e = ulp{l). xo (respectively yo) is the value computed with the semantics of real 
numbers on s from p. Xi is the magnitude (divided by e) of the error of the 
computed result in the semantics of floating-point numbers starting at p, due 

We say in that case that this test is “unstable” . 
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to the rounding operation at instruction e^. Then suppose we extend the trace 
s with operation z = x.y. This derived semantics computes z = zq + X)i=i 
with Zi being a function of the Xk and yi {1 < k < j — 1 and 1 < I < j — 1). 
This semantics is fully described in the forthcoming article |Ma,rni| . Now we are 
going to abstract the coefficients Zi. 

3.3 Abstract Semantics 

We particularize T> with A being the interval lattice (but this is easy to gener- 
alize on any non-relational abstract domain) and C being the set of locations 
(identified again with a subset of IN). We will only deal here with a forward 
abstract semantics that we call Of course, having a backward abstract se- 
mantics would enable us to gain more precision during analysis, using iterates 
of forward and backward iterates |CCH2a| . but this is outside the scope of the 
paper. 

The semantics of expressions is defined using operations -|-“, — znu“, 

as follows: let x = [ao,5o] -I- U = [co,do] + be 

two affine forms. We are tryin g to find a good abstraction for an operation . 
on X and y at location j, givingij the result z = zq + abstract 

semantic functions are (using 0, 0 etc. of Sect. E| where we assume M' = M): 

i-i 

x+°^ y ‘^= ([ao,6o] 0 [co,do]) +'^{[ai,h] ® [ci,di]) e^ + {a o r{x) ®ao r{y))€j 

1=1 

This merely translates the fact that the “real value” of the sum should be in the 
sum of the (floating-point) intervals containing the real values of x and y. The 
errors from must be over-approximated by the sum of the errors for computing 
X and y at e^. The last term (factor of €j) is due to the rounding of the “real” 
sum operation at Cj. Because of the IEEE-754 standard, its magnitude is at most 
ulp(z) where z is the floating-point sum of x and y. It is easy to see that it is 
less or equal than {a o r{x) 0 a o r{y))ej. The other rules are: 

y ([ao,6o] 0 [co,do]) 0^ i[a^,b^] 0 [ci,di]) e^ + {ao r{x) Qao r{y))tj 

i -1 

a; x“ ?/ ‘''= [ao,6o] 0 [co,do] T X! ® ° ^(2/)) ® (a ° r{x) 0 [ci,di])) e* 

i=l 

0 (a o r{x) 0 a o r{y))cj 
i-i 

inv{x)°" = inv°{[ao, 6q]) — inv°{a o r{x) 0 a o r{x)) 0 bi]ei 

i=l 

+inv°{a o r{x))ej 

Note that we have used a suitable relabelling of control points, identifying them with 
an interval of integers. 
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\/[ao,bof + inv''{[2, 2] 0 ^/a~ol\^) 0 '^[ai,bi]ei + o r{x) cj 

i^l 

The correctness of this semantics with respect to the semantics of Sect. o is 
expressed as follows; let x and y be two affine forms, and let J (respectively h^) 
be any of the operations , (respectively inv^ , then r{x)Jr{y) C 

r{x.°‘y) (respectively r{hf{x)) C h°'{r{x))). 

Addition and subtraction rules are easy, even if they are too approximate 
in fact because we always say that the operation might create an inaccuracy of 
up to one ulp around the result (which is at most, by the IEEE 754 norm of 
ulp{\ x.y I) <1 x.y I ulp{l)). 

Let us show for instance the correctness of x“. Take U = Uq + X^i=i Ui€i 
and V = Vq + affine forms, with Ui = [a^, bi] and Vi = [ci, di\. Let 

u e r{U) and v G r{V). We write u = uq + (where e = ulp{l)) and 

V = vo + Y^^~\ Vie where Ui G "i{Ui) and Vi G l{Vi). We consider ux^v = M{uv). 
We have uv = u^vo + so, 

A4(u X v) < UV+ \uv\ e 

< uqVo + + vUt)e+ I MU I e 

< maa:([ao,5o] 0 [co,do])+ 

Y^lZl{rnax{r{U) 0 [ci,di] + [a^,bi] 0 r{V)))e 

+max{\ r{U)r{V) |)e 



Which shows one part of the inclusion. The rest is left to the reader. 

For the formula for inv, the abstraction is correct since a; — *■ - is concave on 
its domain of definition, so it can be safely approximated on an interval [a, b] by 
(for X G [a, 6], X -I- (5 G [a, b]): 

1 1 ^ 1 ^ r i - ^(5 if a6 > 0 

A c;' c' J a ao 

X a? “x-|-(5“\^-^i5ifa6<0 



Same proof with square root, but this time the function is convex so we 
approximate it on [a, b] by: 



H < Vx + S < \fx + 

\fd “t“ Ve 

Of course, there are other ways to give lower and upper bounds to these com- 
putations. The choice we have taken is that the individual coefficients of the ei 
should reflect the magnitude of the error coming from the computation at in 
the total error on a trace of computation. Of course this depends on the “formal 
derivatives” that appear in these formulae. 

The correctness of this semantics with respect to the semantics of Sect. 1,4. 2l is 
as follows (more details in the forthcoming article |Ma,rfll j ) . On the set of paths 
that Prgm can execute from a set of initial environments, x (respectively y) 
has real value in [ao,6o] (respectively [co,do]) and errors coming from location 



2^/x 
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6i {i G {I,-’’ )j ~ 1}) are in [ai,bi] (respectively [ci,di]), then on the set of 
paths that Prgm', z = x.y can execute from the same environment, z has real 
value in zq, and the error coming from location is in Zi plus the error in the 
computation of the operation . itself (represented by €j). The effect of adding 
the operation z = x.y on the magnitude of the error coming from location et 
(i < j) is reflected by the derivative of the operation in question. 

We have not spoken of the abstract semantics of constants and tests. Con- 
stants are easy to abstract, the IEEE754-1985 rules dealing with constants are 
very precise and we can determine for sure whether we lose precision or not. 
Tests are more complex. We use local decreasing iterations as in j( fra,9‘2j . For 
instance suppose we want to interpret x == y. The corresponding abstract op- 
erator ==“ will be the greatest fixed point of the functional F on affine forms, 
which to every pair of affine forms {x = xq + y = Vo + VFi) associates 

{x' = x'q + y' = y'oF y'Fi) with (each component of the functional is 

in A, i.e. here the lattice of intervals) 

( x'Q = Xof^{^{y-^Y.^ 

I —— X ’ 

I y'o = vo^{r{x-'^Y.iy^^i)) 

yy'i = y^ 

This is not the best abstraction (on the coefficients of e^) but it is enough to 
show that some tests might be unstable (when the order of magnitude of the Xi 
or yi is not negligible with respect to xq and yo respectively). 

4 An Example 

Let us decorate now the different floating-point operations for program (A): 
x=l; 

y=(sqrt(5) ! 1! -1 !2!)/2 !3!; !4! 

f or (i=l ; i<20 ; !++)■[ 
z=x; 
x=y; 

y=z-y; !5! 

} 

The semantics using affine form0 goes as follows; first for the locations 
before the loop: 

!l! : v^= [2.236068, 2.236069] -k [2.236068, 2.236069]ei 
!2! : - 1 = [1.236068, 1.236069] -k [2.236068, 2.236069]ei -k 

[1.2360676, 1.2360684]e2 



This comes from a library programmed in C by Nicolas Regal in 1999 |Reg^. 
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!3! : i = [0.5,0.5] + [0.5,0.5]e3 

!4! : ^ = [0.618033936, 0.618034058] -h [1.118033936, 1.118034058]ei 

-h [0.6180338176, 0.6180342272]e2+ [0.6180340736, 0.6180342784]e3 
-h [0.6180340736, 0.6180342784]e4 
= V 

(Notice that ^ is blindly over-approximated. This could be done exactly in a 
more refined semantics). We can then look at the abstract values on the first 
unfolding of the loop. For instance, in the first loop we find the abstract value 
for y to be: 

[0.381966004, 0.381966126] -h [-1.118033935, -1.118033813]ei -h 

[-0.6180341760, -0.6180337664]c2 + [-0.6180342272, -0.6180340224]c3 + 
[-0.6180342272, -0.6180340224]c4 + [0.3819656448, 0.3819658240]c5 
(concretization is [0.38196568,0.38196583]). Then, 

[0.236067780, 0.236068024] -h [2.236067872, 2.236068116]ei -h 

[1.2360676352, 1.2360684544]e2 + [1.2360681472, 1.2360685568]e3 + 
[1.2360681472, 1.2360685568]e4 + [-0.1458974464, -0.1458969344]c5 
(concretization is [0.23606846,0.23606873]) Then, 

[0.145897995, 0.145898346] -h [-3.354102050, -3.354101562]ei -h 

[-1.854102528, -1.8541012992]e2 + [-1.854102528, -1.8541021184]e3 + 
[-1.854102528, -1.8541021184]e4 + [0.2917938944, 0.2917948160]e5 
(concretization is [0.14589696,0.14589734]) Then, 

[0.090169434, 0.090170029] -h [5.590169434, 5.590170411]ei -h 

[3.090168832, 3.0901712896]e2 + [3.0901702656, 3.0901714944]e3 + 

[3.0901702656, 3.0901714944]e4 + [-0.2016236672, -0.2016221184]e5 

(concretization is [0.09017118,0.09017179]) Then again (the fifth time we go 
around the loop): 

[0.055727959, 0.055728913] -h [-8.944272460, -8.944270507]ei -h 

[-4.9442738176, -4.9442693120]c2 + [-4.9442742272, -4.9442717696] 63 -h 
[-4.9442742272, -4.9442717696]c4 + [0.2573515008, 0.2573540352]c5 
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(concretization is [0.05572883,0.05573179]). We see that the coefficients of the ei 
up to J = 4 get bigger and bigger as the expected value gets smaller and smaller. 
The subtraction in control point 5 does not lose much precision as such. This 
means the loop magnifies the initial error of computation of y at each turn. This 
is an example of bad-conditioning. For the well-conditioned example computing 
(—1/3)”, the computation with affine forms would show there is no problem. 

Of course, in general we cannot unfold loops like that in a static analyzer. 
After some number of unfoldings, we use our widening operator, which would 
predict a huge potential loss of precision. We need better widening operators in 
general. 



5 Improvements 

5.1 AfRne Interval Transformations 



The idea is to consider that the semantics creates dependencies between the 
coefficients (due to an inaccuracy at location i) that we can approximate by 
linear dependencies. This choice is motivated by the fact that a great deal of 
numerical codes compute affine operations (also quadratic sometimes in finite 
elements methods). It is also motivated by the fact that we know in general how 
to linearize errors, and we know how to manipulate affine constraints (which are 
used for instance in in static analysis). 

We call T an affine transformation on the space generated by {ei, • • • , 6n} 
if there exists a, n x n matrix A, and an n-dimensional vector B such that for 
all vectors A, T{X) = AX + B. We can represent such a transformation by the 
pair (A, B) . We abstract a set of affine transformations by abstracting its sets 
of coefficients in A and B by an element of A. In fact the semantics of Sect. ft. 2 1 
gives a set of such transformations each over-approximating the effect of each 
trace. 

For instance, setting A to be the interval domain. 



a 







(([^A] [2,3] 

lUO^O] [1>2] 



[3,3] 

[4,5] 



The abstract domain of affine error dependence T is therefore isomorphic to 

^(nm) +nm 

{n is the number of control points, m is the number of variables) 
with component-wise ordering. This means that as for affine intervals, if A is a 
lattice, then T is a lattice with intersection and union computed pointwise etc. 

The concretization function G goes from T to the set of all (concrete) affine 
transformations. G{A') = {aX)i<ij<nm, B' = (6')i<j<nm) is the set of affine 
transformations (A = B = {bj)i<j<nm) with G (for 

all 1 < i,j < nm) and bj G 'fAibj) (for all 1 < j < nm). 

What is important to see now is that these abstract affine transformations act 
on elements of T>, because we can use the semantics of the operations -I-, * in A 
to compute a safe approximation of {AX + Bl{A, B) C G(A', B'),X G G{X')}. 
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For instance, the instruction x + y = z instruction j will be written in 
matrix form as (we only represent a sub-block of the complete matrix here): 



eo(x) 



£n(x) 

£o(y) 



£n(y) 

€o(z) 



£n{z) 

<^j(z) 



£o{x) 

( 1 



V 



£n{x) eo{y) 



£n{y) 



7 O r{x) 070 r{y) J 



We do not write the other rules since they are the transcription of what we have 
seen for affine interval forms, on affine interval matrices. 



5.2 Principle of the Improvement 

Let X be the product of the domain Var ^ T> (each variable is associated with 
an affine interval) with T. An abstract value in A is a pair (/ = Ax.ao(x) 0 
Oi(a;)ei, (A, B)) which describes an abstract state at some location ej for 
which the value of variable x is in 7(00(0:)) plus inaccuracy errors of order 
j(ai(x)) coming from control point i, together with the abstract affine transfor- 
mation approximating the way these inaccuracy errors have been transformed 
by the instructions just before location ej. This extra information added to / 
would not be necessary if the control flow of the program we are analyzing was 
acyclic. It is only when we need infinite least fixed point iterations that we can 
benefit from the approximation of the transformation of errors that take place at 
certain control points (given by (A,B)) to widen the iterations. So in practice, 
the abstract affine transformations will be managed at some suitable widening 
points as defined for instance in |Bou 9 (lfjou 98 | (heads of loops, return sites in 
case of inter-procedural analysis of mutually recursive functions). 

We use this extra-information for getting better widening operators. In fact 
we approximate the abstract affine transformation by a transformation that 
multiplies by an upper approximation of the spectral radius of the transformation 
(A,B). We can then look at the asymptotic value: limn^ooA'^Xo + ^-^zif^B. 

Unfortunately, most of the ‘interesting properties” that we might want to 
compute on G{A' , B') are NP-complete. Among these interesting properties are, 
the property of having all the transformations invertible, or the determination 
of the spectrum of all the transformations (“spectral portrait”). So we need to 
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approximate further, so that we can compute in an efficient manner a good upper 
approximation of the spectral radius of {A,B). 

“Any” norm on matrices can be used to determine an approximation of the 
spectral radius. If A = ||A||^ = I ai,j I. ||-4||2 = I “hi P> 

■ ■ ■ I Halloo = max {| aij |} can all be used because, the maximal norm of its 
eigenvalues is Xmax < ||A|j (n||A|| in the last case). This is not a very precise 
though. 

There are in fact better algorithms to calculate this spectral radius. A very 
famous method (iterative power) is as follows. Let A = (aij)i<ij<„ be a matrix, 
u any non-null vector, and {qk)k>o the sequence, go = i i “ i i , ■ ■ • Qk = n i i , 

■ ■ ■ qk converges (when k — > -|-oo) towards the greatest norm of the eigenvalues 
of A. Unfortunately this is not of much use in an abstract calculus since we do 
not know if any of the iterates are upper-approximations of the spectral radius. 
We have only a weaker result, about the convergence of the (qk) sequence. 

There is a nicer approximation which only uses “lattice-theoretic” notions. 
It is called the “Gerschgorin discs” : 

Lemma 2. Let A = (aij)i<i,j<n be a matrix. The spectrum of A is contained 
in Di n Z ?2 (in. the complex plane) with, 



Lt\ — ^l<i<nLtl,i; 

— D 2 = Ul<j<nD2,j, 

— Di i is the circle with center Oi^i, radius r\^i = J2i<j<n I b 

— D2J is the circle with center Ojj, radius r2j = X)i<i<ra i/j I I- 

A good approximation of the biggest absolute value of the eigenvalues of real 
A is thus, 

G{A) = max {| Oi^i \ -l-max {ri_i, r2,i}/l < i < n} 

Consider again the example 0 and in particular program (A) . The analysis 
using affine transformations basically discovers something that numericians are 
used to. At each loop the errors on a;, y et z are given by the affine transformation 
(we give here only a sub-block of the complete matrix) , 




with at the first iteration of the loop Sy = e = 2 and a is the error due 
to the rounding of operation — in the loop. 



/O 1 0 
A= 1-10 
\1 0 0 

We find G(A) = 2 (instead of 1.6180- ••). Then we have to notice that 
a < 2“^^. In fact the affine transformation has two non-null eigenvalues p\ = 
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and po = ^ 2 '^ < — 1. Therefore the error at the 20th iteration of or- 
der p^Lpg (about 4 * 10“^ -I- 2.4 * 10“^) bigger than pg° (about 

6 . 6 * 10 - 5 ). 

Of course to do this we need a reduced product with at least an interval 
analysis on the integer variables (to determine the right number of loops) . This 
will not be described here. 

Note that program (C) has G{A) = 1 (instead of 1/2 which is the exact 
greatest eigenvalue), hence the inaccuracy does not increase at each iteration of 
the loop. 



6 Related Work 



To our knowledge, there are two main types of tools that are used to help the pro- 
grammer compute with floating-point operations (see |PMMM 95K X lFf)6IDMf)7j 
for general references) . 

The first type of tool uses alternative arithmetic implementations to better 
match the “ideal” semantics of reals. For instance, interval arithmetic [tlLYUj. 
PMoo79j implements a real number as an interval of floating-point numbers con- 
taining it. Multi-precision arithmetic |K,T93ISco89| uses variable length floating- 
point numbers to approximate at best real number computations. Rational arith- 



i»^Kii£ini:«i innefS 8fS!K«i 



implements exact arith- 



metic jBak75ICle74IHCL+68l 
metic for rational numbers, which can be used to approximate real numbers, for 
instance using continued fractions |KM85IH173ISei83IVui9(H . These methods do 
not solve the problem in all cases; interval arithmetic might lead to very impre- 
cise (but true) resiiltf^. multi-precision arithmetic and rational arithmetic may 
be very costly to perform on real scientific codeJ^ 

A new and promising line of research uses domain theory and fractal encoding 
of exact real numbers |ES98IEP97IEda971PEE97IPEMj . We do not know yet how 
we can use these ideas for static analysis purposes. 

Another approach is at the heart of the CADNA software . It is known 

as the perturbation method (CESTAC) or stochastic arithmetic |(1V88I(1V92| . 
|Vig96| . The idea is that round-off errors can be modeled as quasi-Gaussian 
distributions of some sort, and that a simple statistic test (Student test) can 
estimate its parameters, thus enabling to give better approximations of real 
number computations. This method and tool is probably one of the most favored 
among the code programmers, since it is quite precise. But it has lead to some 
criticisms, see for example. We review some of its central ideas more 



See Sect. 0 for examples of that phenomenon. 

And they do not solve the problem in general, see l(.X194l where the following classical 
dynamical system example by J. M. Muller Un+i = 111 — -I- ^ with Uo = 2 

and Ui — —4 leads, with floating-point nnmbers of finite precision up to a hundred 
decimal digits, to 100 whereas the exact value for the limit is 6. Limits of some other 
dynamical systems [HeaOl j can only be computed by exact real numbers, i.e. infinite 
precision arithmetic! 
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in detail in Sect. lO A similar approach (still stochastic but in a “backwards” 
manner) has been implemented and tested under the name “Precise” EnroDi. 
We thought of using similar probabilistic ideas as a basis of a static analysis 
but at the moment there seems to be no way to ensure (even probabilistically) 
the correctness of the approach. Very important work is being carried out on 
the foundations of probabilistic abstract interpretation IMonO()IMonOn which 
we might use for this purpose later on. 

In some applications (image synthesis etc.), some algorithmic geometry has 
been specifically designed, like for CGAL, IBCD~*~9^ in order to use in a very 
controlled manner the IEEE 754 floating-point numbers for some computations. 
We do not know yet how to use this for our purposes. 

On the more mathematical side, let us mention the work done in auto- 
matic differentiation (which could lead to interesting connections with our work) 
Pnlinj and some specific studies of the precision of some numerical schemes, for 
which we direct the reader to the general reference mm and also to the article 

jnsnzi. 

We know of only one example of a static analyzer (such as we are discussing 
in this article), that not only tries to give more precise results on one execution, 
or give some hint about the precision on one execution, but rather assesses a 
property valid for all (or a large class of) executions. This analyzer |AOfG 92| 
actually uses abstract interpretation. We will explain the abstraction chosen 
more in detail in next section. 



6.1 Another Abstract Interpretation 

The abstract interpreter |ACFG92| is based on the following underlying concrete 
model. Floating-point numbers are identified with a pair / = (m, e) with, a 
mantissa m G M = {m G Zjj) G IM, — (10^ — 1) < m < 10^ — 1}, and an 
exponent e G E = {e G Z/q G^,—q < e < q}. 

Abstract values are / = (s,p, em, eM) where, s = , -|-/— , T is an abstract 

sign, p is an integer representing the “number of significative digits” and [em, eM] 
is the “interval of exponents” . For instance < -I-, 4, 6, 12 > denotes in the concrete 
model, 

F' = {0.1000 * 10®, 0.1001 * 10®, • • • , 0.9999 * lO^^} 

It is proved to form a complete lattice with a Galois connection with the set 
of subsets of the concrete model. Abstract operations like Add, Sub, Mul, Div 
are defined. Unfortunately, no interpretation of tests nor of loops is made, thus 
greatly restricting the precision of the analysis. The analysis is sometimes even 
weaker than an interval analysis of floating-point numbers, thus much weaker 
than our analysis. 

6.2 CADNA 



CADNA implements the CESTAC method fGV88IGV92iFV74iVA85I VT?7^. 
|Vig8Y|Vig93| . The underlying model is that the round-off errors are of the 
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form 2~P .ai where the ai are equi-distributed independent random variables 
on ] — 1, 1[ with uniform law. This law is justified both experimentally and by 
some theoretical reasons |Kn m- The perturbation methods goes as follows. At 
each floating-point operation, we perturbate the round-off towards -l-oo or — oo 
with the same probability. We execute N times each of the instructions of the 
program (in practice, N = 2 or 3). The mean value “converges” towards the 
exact mathematical result and the Student test computes a good approximation 
of the standard deviation of the quasi-Gaussian distribution law of the result. 

Thus this method can be used both for improving the computations, and 
for testing the relevance of a given execution. It is very much used in prac- 
tice but some bugs are known: its estimates are sometimes too optimistic. There 
are several reasons for which this may happen IKahDll . The approximation by a 
Gaussian law is justified by the central limit theorem but the convergence is very 
slow on the tail of the distribution (in particular with = 2 or 3). The approx- 
imation is very bad on unprobable events (precisely the ones which lead to very 
costly errors). In order to justify the use of the theorem, we also have to suppose 
that the round-off errors are random, not correlated, continuously distributed 
on a small interval. But in general, the errors are only due to a few round-off 
errors and to singularities (which we might find out by data perturbation but 
probably not by perturbation of the operations). So errors are not uncorrelated 
random variables. Also, the distribution of errors is a discrete distribution on 
floating-point numbers and not on continuous real number. Finally, this is only a 
“first order approximation” : for multiplication and division, second-order terms 
are not considered, but it might occur that they are not negligible. 

7 Conclusion and Future Work 

We have presented some ideas about what static analysis can try to do for pro- 
grams computing with floating-point numbers. Our first concern in the 
DAEDALUS project is to analyze control-command software (like the Patriot 
software seen in the introduction) which is not numerically intensive, and for 
which we think we should have good chances of finding nice solutions. Numeric- 
intensive software, like scientific codes, are much more complex. Some of them, 
such as well-conditioned problems, might be amenable to static analysis. More 
difficult is to consider what can happen for ill-conditioned problems (for exam- 
ple, inverse of a matrix with very small determinant). We believe that there is 
little hope an automatic tool can cope with such problems, but we would like to 
be shown to be wrong . 

In fact, most scientific codes pose new problems to static analyzers. For 
instance, some codes rely on Monte Garlo methods, or more generally on ran- 
domized algorithms. We believe that the static analysis of such algorithms is an 
interesting prospect, but goes beyond the scope of this paper (see pdonOOIMonOlj 
for some ideas which could constitute a good basis for future work) . What might 
have to be considered on these codes is that the random generators that are 
used are only pseudo-random generators. This complexifies again the semantic 
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problem. Also some scientific codes use parallel algorithms which make things 
even more complex, especially regarding the evaluation order of floating-point 
operations; they will depend on actual synchronizations between tasks. Our fu- 
ture work will not try to tackle these very subtle problems. The first extension 
we wish to make is to look at inverse problems i.e. at clever backwards seman- 
tics. The aim is to solve the following problem: what precision should we have 
on the input so that we reach a given precision level on the output. The last 
point is particularly important in the field of on-board software since the wrong 
estimate of the precision for the input of a control/command program can be 
very expensive (for instance, the cost of very precise sensors). 

Acknowledgements. Thanks are due to M. Martel, S. Putot and N. Williams 
for careful proof-reading of this article. 
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Abstract. This paper addresses the following question: Do scalable 
control-flow-insensitive pointer analyses provide the level of precision re- 
quired to make them useful in compiler optimizations? 

We first describe alias frequency, a metric that measures the ability of 
a pointer analysis to determine that pairs of memory accesses in C pro- 
grams cannot be aliases. We believe that this kind of information is 
useful for a variety of optimizations, while remaining independent of a 
particular optimization. We show that control-flow and context insensi- 
tive analyses provide the same answer as the best possible pointer anal- 
ysis on at least 95% of all statically generated alias queries. In order to 
understand the potential run-time impact of the remaining 5% queries, 
we weight the alias queries by dynamic execution counts obtained from 
profile data. Flow-insensitive pointer analyses are accurate on at least 
95% of the weighted alias queries as well. 

We then examine whether scalable pointer analyses are inaccurate on the 
remaining 5% alias queries because they are context-insensitive. To this 
end, we have developed a new context-sensitive pointer analysis that also 
serves as a general engine for tracing the flow of values in C programs. To 
our knowledge, it is the first technique for performing context-sensitive 
analysis with subtyping that scales to millions of lines of code. We find 
that the new algorithm does not identify fewer aliases than the context- 
insensitive analysis. 



1 Introduction 

Programs written in C typically make widespread use of pointer variables. In 
order to analyze a program that uses pointers, it is necessary to perform a pointer 
analysis that computes, at every dereference point in a program, a superset of 
the set of memory locations that may be accessed by the dereference. These 
“points-to” sets can be used to perform alias analysis in an optimizing compiler: 
two memory accesses whose points-to sets do not intersect cannot be aliases. 
Alias information can be utilized by a variety of optimizations, including but 
not limited to code scheduling, register allocation, loop unrolling and constant 
propagation. 

Over the years a wide variety of algorithms for pointer analysis have been 
gorithms either do not scale to large programs, or are believed to produce poor 
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alias information. This is one reason why most optimizing compilers do not 
perform global pointer analysis, and are therefore forced to make conservative 
assumptions about potential aliases. In this paper, we argue that scalable pointer 
analyses do produce precise alias information. 

We are interested in determining whether scalable pointer analyses can im- 
pact a variety of optimizations. Therefore, we avoid evaluating pointer analyses 
in the context of a specific optimization and a specific compiler. Instead, we de- 
velop a new metric, “alias frequency” , that measures the frequency with which 
a pointer analysis is forced to assert that a pair of statically generated memory 
accesses in a C program may be aliases. Our experiments show that the alias 
frequency of scalable pointer analyses (in particular, Das’s algorithm is 

within 5% of the alias frequency of the best possible pointer analysis. 

Although this result is extremely encouraging, we must also consider whether 
the 5% alias queries on which scalable pointer analyses are imprecise may be the 
very queries that have the greatest impact on a given optimization. If this is 
so, the code associated with these queries must dominate the run-time of the 
programs. Then, if we weight the responses to alias queries by dynamic execution 
counts from profile data, we should expect a large gap in alias frequency between 
Das’s algorithm and the best possible pointer analysis. However, our experiments 
show that Das’s algorithm is within 5% of the best possible pointer analysis in 
terms of weighted alias frequency as well. 

One possible source of the remaining inaccuracy in Das’s algorithm is its 
lack of context-sensitivity. To understand the impact of this limitation, we have 
developed a new algorithm that is a context-sensitive version of Das’s algo- 
rithm. Our generalized one level flow (GOLF) algorithm uses the one level flow 
idea from to achieve a limited form of context-sensitivity in addition 

to subtyping. Our results show no appreciable decrease in alias frequency from 
context-sensitivity. 

GOLF is a general engine for tracing the flow of values in programs with 
pointers and indirect function calls in a context-sensitive manner. It can be 
used for applications such as program slicing or escape analysis 

GOLF is the first context-sensitive analysis with subtyping that scales to millions 
of lines of code. Even though GOLF does not improve alias frequency, it can 
provide much more precise results than a context-insensitive algorithm if the 
client of the analysis is itself context-sensitive (see Section ^3 . 

In summary, we make the following contributions: 

— We present “alias frequency”, a new metric for measuring the impact of 
pointer analysis on optimization. 

— We demonstrate that scalable pointer analyses are able to produce precise 
responses to at least 95% of all alias queries on all of our test programs. 

— We show that the addition of context-sensitivity does not improve the alias 
frequency of scalable pointer analyses. 

— We present GOLF, a new flow-insensitive pointer analysis that utilizes a 
limited amount of context-sensitivity and subtyping. It produces a points- 
to graph that is linear in the size of the program, in almost linear time. 
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All points-to sets in the program can be extracted from the graph using 
CFL-Reachability, in worst-case cubic time. 

— We show that on all of our test programs, GOLF is linear in time and space 
requirements. We also show that the limited forms of context-sensitivity and 
subtyping used in GOLF provide the same precision as algorithms with full 
polymorphism and subtyping. We therefore claim that GOLF is likely to 
provide the same precision as an algorithm with full polymorphic subtyping. 

The rest of the paper is organized as follows: in Section^ we motivate GOLF 
through an example. We describe GOLF in Section^ In Section^ we present 
alias frequency. In Section^ we present our empirical results. We discuss related 
work in Section ^ and conclude in Section 

2 Example 

Gonsider the fragment of a G program with function calls shown below. 

id(r) { return r; } 
p = idi(&x); 
q= idj(&y); 

*p = 3; 

The goal of a context-sensitive pointer analysis is to avoid confusing the addresses 
returned from the function id to the variables p and q at the two calls to id. 




(a) (b) 



The points-to information computed by Das’s algorithm is shown in (a) above. 
The points-to graph shown contains nodes representing memory locations and 
edges representing pointer relationships. Every node contains a single pointer 
edge. Thus, the target of the node for p represents the location *p. The points-to 
graph includes special “flow” edges (labeled <) between nodes. Flow edges are 
introduced at assignments, one level below (in the points-to graph) the expres- 
sions involved in the assignment. In the example program, the implicit assign- 
ment from &a; to parameter r induced by the function call introduces a flow edge 
from the node for x to the pointer target node of r, indicating that the set of 
symbols represented by *r must include x. The return statement in id induces 
implicit assignments from r to p and from r to q. As a result, the set of symbols 
represented by *p includes both x and y, even though there is no execution of 
the program in which the address of y flows to p. As has been pointed out by 
several authors in previous work the problem arises because a 
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value flowing in to id from one call site (z) is allowed to flow out to a different 
call site (j). 

The points-to graph produced by GOLF is shown in (b) above. We label flow 
edges arising from function calls with identifiers. All edges from a given call site 
have the same identifier. Edges also have a polarity, indicating whether a value 
is flowing in to (— ) or out from (+) the called function. From this graph, we can 
see that x need not be included in the set of symbols at *q, because the only 
path from x to *q has an edge labeled z_ followed by an edge labeled j+. In a 
“valid” flow path, calls and returns are “matched” : an edge labeled z_ may be 
matched only by an edge labeled i+. 

A valid path in the GOLF graph is one whose sequence of labels forms a 
string in a context-free language of matched /_ and /_|_ labels. It is well known 
that the presence of valid paths between a pair of nodes can be determined in 
worst-case cubic time using GFL-Reachability queries 

Both Das’s algorithm and GOLF achieve scaling partly by limiting the use 
of flow edges to one level in the points-to graph, while using unification (or, type 
equality rules) to merge nodes at lower levels. Our experiments show that the 
restriction of context-sensitivity to one level does not lead to loss of precision 
compared to full context-sensitivity. 



3 GOLF: Generalized One Level Flow 



A pointer analysis can be thought of as an abstract computation that models 
memory locations. Every location r is associated with an id or set of symbols 
if, and holds some contents a (an abstract pointer value) (Figure J (b)). A 
location “points-to” another if the contents of the former is a pointer to the 
latter. Information about locations can be encoded as a points-to graph, in which 
nodes represent locations and edges represent points-to relationships. 

In Steensgaard’s unification-based algorithm the effect of an assign- 

ment from y to a; is to equate the contents of the locations associated with y 
and X. This is achieved by unifying (he., equating their ids and contents) the 
locations pointed-to by y and x into one representative location. Das’s algorithm 
extends Steensgaard’s algorithm by pushing the effect of assignment processing 
one level down the chains in the points-to graph (Figure The effect of an 
assignment from y to a; is to introduce a special “flow” edge from the pointed-to 
location of y to the pointed-to location of x, and to equate only the contents of 
the two pointed-to locations (Figure J (a)). Flow edges relate ids of locations: 
all of the symbols in the id of the source of a flow edge must be included in the 
id of the target of the edge. Assignment processing is represented declaratively 
in Figure J(b): the type rule says that the program is correctly typed iff the 
pointed-to locations of y and x have the same contents, and if the id of the 
pointed-to location of y is a subset of the id of the pointed-to location of x. 

GOLF extends Das’s algorithm by treating implicit assignments induced by 
function calls in a special manner, so as to obtain context-sensitive information. 
The effect of function calls in GOLF is shown in Figure B Parameter passing 
induces flow edges that are labeled by a call site identifier and a polarity (Figure 
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(b) s € Symbols 

T € Locations ::= 

tp e Ids ::= {si,..., s„} 

a £ Values ::= _L | ptr(r) 

_L < a 

ptr{(p, a) < ptr{p' , a) 'P ^ V>' 

A \- X \ {p,a) A \- y ■. {p ,<y!) 
a' < oi 

A h welltyped{x = y) 



Fig. 1. Assignment processing in Das’s algorithm. Figure (a) above shows the 
points-to graph after processing x = y. The domains and type rule in figure (b) 
above provide a declarative specification of assignment processing. 




(h) ptr{(p,a) ptr{(p',a) P <p p' p £ { + , -} 



A p : {if, a) A r ■. {p , a) A ^ x A ^ y ■. {<py, ay) 

A f : a ^ a' A \- f : a a' 

Vs G S* : A h welltyped(s) Oy <!_ a a' <+ ctx 

A h welltyped{f = fun (p) (r) S*) A h welltyped{x = fi{y)) 

Fig. 2. Function call processing in GOLF. The graph fragments in (a) above 
represent points-to information after processing x = fi{y), a call to function / 
with argument y at call site i. For ease of exposition, we assume that functions 
are normalized: the statement / = fun (p) (r) S* defines a function / that has 

a single formal parameter p, an out parameter r that holds the return value, and 
a statement body S* . The labeled constraints <!_ and <!|_ generated at function 
calls are similar to instantiation constraints used in polymorphic type inference 
^^^3, except that the direction of constraints wit h negat ive (— ) polarity is 
reversed to match the direction of flow of values (see 
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H(a)). The polarity indicates the direction of flow of values, either into the called 
function through a formal parameter (— ), or out of the function through a return 
statement (+). Function call processing is represented declaratively through the 
type rules for function definitions and function calls in FigureJ(b). Value flow in 
and out of a called function generates special labeled constraints between the ids 
of pointed-to locations, while the contents of pointed-to locations are equated. 
These labeled constraints are similar to subset constraints, except that the labels 
are used to restrict the ways in which constraints can be composed transitively. 
As explained in Section Q we wish to rule out the invalid flow of values that 
arises when an edge labeled i_, representing the flow of values into a function 
at call site i, is followed by an edge labeled j+, representing the flow of values 
back to a different call site j. 

Valid Flow Paths. The set of valid flow paths is characterized precisely by 
the grammar shown below, and taken from The sequence of labels en- 

countered along a path of flow edges forms a string. A path is a “valid path” iff 
its sequence of labels forms a string in the context-free language recognized by 
non-terminal S: 

S ::= P N N ::= M N \ i_ N \ e 

P :■= M P \ i+ P \ e M ::= M i+ \ M M \ < \ e 



3.1 Declarative Specification 

The GOLF algorithm can be viewed as a set of non-standard type inference rules 
over a simple language of pointer related assignments. The set of rules includes 
the rules from FigureO the rules from Das’s algorithm for handling various 
kinds of explicit assignment statements, shown below: 

A \- X \ {(p , a) A \- 

A \- X ■. {ip, a) A \- y ■. T 

^ ^ ptr{T) < a 

A h welltypedix = y) A h welltyped{x = ky) 



A \- X ■. (p,a) A \- y : {p' ,ptr{r)) Aha;: {p' ,ptr{r)) A h j/ : {p,o) 

T = [p ,a ) T= [p ,a ) 

a” < a a < a” 

A h welltyped{x = *y) A h welltyped{*x = y) 

3.2 Correctness 

We claim that the type rules in Figure H^tnd above provide a specification 
of a correct flow-insensitive but context-sensitive pointer analysis. This follows 
from the observation that our type rules can be vi ewed as a restriction of the 
type system presented by Rehof and Fahndrich in Their type system. 
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which has been shown to be correct, defines an algorithm with full subtyping 
and polymorphism. The GOLF type rules define an algorithm with one level of 
subtyping and one level of polymorphism. 

There is a formal connection between constraint satisfaction in our type 
inference rules and valid paths in the GOLF points-to graph. The connection is 
provided in 

Global Storage. The goal of GOLF is to identify all valid flow induced by 
function calls. In G programs, this may include fiow because of uses of global 
variables within functions. It is possible that some fiow may occur because of 
a global variable even though no labeled fiow edge is produced in the points- 
to graph. Reps et al have suggested treating globals as extra parameters, but 
this may lead to a large increase in the size of the points-to graph. Instead, we 
identify nodes associated with globals (we call these “global storage” nodes) and 
add self loops on these nodes, labeled with every possible call site and polarity. 
This conservative approximation ensures that we cannot omit any fiow of values 
through global variables. 

A similar problem occurs with indirect accesses through pointer valued pa- 
rameters. One solution would be to modify our function call rule to add self loops 
with the given call site label on all nodes below the nodes related by labeled fiow 
edges. Instead, we use the conservative approximation of treating these nodes as 
global storage nodes. 

3.3 Operational Algorithm 

Every symbol referenced in the program is associated with a unique location 
on demand. The program is processed one assignment at a time, including im- 
plicit assignments generated by function calls. At every assignment, locations 
are unified as necessary to satisfy the type equality requirements imposed by 
the non-standard type inference rules in Figure^and Section^3 Processing of 
simple subset constraints and labeled constraints is delayed by introducing fiow 
edges between locations, as shown in Figures^and^ When two locations are 
unified, fiow edges between the locations turn into self loops. Unlabeled (<) self 
loops are discarded, but labeled (<p self loops are retained in order to capture 
all valid fiow. The GOLF graph could therefore contain more edges than Das’s 
points-to graph. In practice, the increase in edge count is very low. 

Once processing of the entire program is complete, points-to sets are pro- 
duced from the points-to graph. Symbol x must be included in the points-to 
set at a dereference *p iff there is a valid path from the node associated with 
X to the node associated with *p. Reps et al have observed that the presence 
of such a path can be determined using GFL-Reachability queries We 

use single-source queries, one for each symbol in the program, to populate all 
points-to sets. 

Global Storage. We identify and mark global storage nodes in a linear scan 
of the points-to graph. Instead of adding a linear number of self edges at each 
global storage node, we account for the effect of the self edges implicitly: if there 
is a valid path from node u to node v, where u is a global storage node, and 
there is a valid path from node v to node w, then there must be a valid path 
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from node u to node w. This is because any unmatched labeled edges in the 
path from u to w through v can be matched by following an appropriate set of 
self edges at v. In other words, the effect of a global storage node is to introduce 
transitivity in CFL-Reachability queries. This leads to a modified reachability 
procedure: node v is reachable from node u iff it is possible to reach from u to u 
by “stepping” on global storage nodes and using valid paths to “hop” from one 
global storage node to the next. 



3.4 Complexity 

The algorithm has two steps: an assignment processing step, which produces a 
points-to graph with flow edges, and a flow propagation step. The first step has 
the same complexity as Steensgaard’s algorithm. It uses linear space, and has al- 
most linear running-time (in the size of the program) . Every implicit assignment 
causes the addition of a single labeled edge. The number of implicit assignments 
is linear even in the presence of indirect calls, because there is a single signature 
for all possible target functions at a call site 

The flow step involves a CFL-Reachability query on the graph for each symbol 
in the program. The worst-case cost of an all-pairs CFL-Reachability query over 
a graph is cubic in the number of graph nodes . Therefore, the complexity 

of GOLF is cubic in the size of the program. 



3.5 Efficient CFL-Reachability 

In this subsection we explain three insights that allow us to efficiently compute 
points-to sets for large programs. 

Memoization. Our first insight is that a simple memoization, borrowed from 
can allow us to amortize the cost of multiple queries by avoiding re- 
peated work. Our experiments show that in every points-to graph, there is a 
single node (the “blob”) that has a large number of outgoing flow edges. In ev- 
ery graph, the blob has an order of magnitude more outgoing edges than any 
other node. Now consider the set of symbols that have valid paths to the blob. 
For each such symbol, we would repeat a scan of the subgraph originating from 
the blob. Instead, we would like to perform the scan from the blob exactly once, 
cache the result, and share this across all symbols that reach the blob. 

We pre-compute the set of nodes reachable from the blob (“frontier nodes”), 
and the set of symbols that reach the blob. For every symbol that does not reach 
the blob, we perform a forward scan to dereference nodes, as usual. For a symbol 
that reaches the blob, we perform a forward scan, but we stop at frontier nodes. 
Once we have processed all symbols, we append the symbols that reach the blob 
to the points-to set at every frontier node. 

Consider a symbol for which we compose the scan from the blob with the scan 
from the symbol as described above. Because CFL-Reachability is not transitive, 
we may be treating more nodes as reachable from the symbol than necessary. 
However, if the blob is a global storage node, we can compose without loss of 
precision. The only programs for which the blob is not a global storage node are 
extremely small, and therefore do not require memoization. On the other hand. 
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there may be some frontier nodes at which the scan from a symbol arrives with 
less matching requirements for a valid path than the scan from the blob. If we 
stop the scan at these nodes and compose with the scan from the blob, we may 
fail to visit some nodes. Therefore, we identify such cases during the scan from 
the symbol, and continue the scan through the frontier node. 

This simple memoization results in dramatic speedup. Our empirical evidence 
shows that there are many scans that involve the blob, for which we amortize 
the scan cost. All remaining scans cover very small regions of the graph. 

We believe that the existence of the blob is not a coincidence. Rather, it 
reflects the presence of global variables that are referenced throughout a pro- 
gram. The blob node is an accumulator for large points-to sets. These sets are 
poor targets for improvement via more precise pointer analysis, because they 
are unlikely to shrink to very small sets, and because a precise analysis is likely 
to spend considerable resources tracking global variables. Points-to sets outside 
the reach of the blob are better targets for more precise analysis. 

Global Storage. Our second insight is that we can use the transitive behaviour 
of global storage nodes to make a single scan more effldent. Global storage nodes 
serve as points where we can use a divide and conquer strategy to form longer 
valid paths from shorter valid paths without enforcing matching requirements. 
Summary Edges. Our algorithm for a single CFL-R eachabili ty query is based 
on a demand algorithm outlined by Horwitz et al in which improves 

the efficiency of queries by adding special summary edges to the graph. We have 
adapted their algorithm to handle nodes that are shared across functions because 
of unification, and to handle global storage. 



4 Alias Frequency 

We are interested in estimating the impact of pointer analysis on compiler opti- 
mizations, in a manner that is independent of a particular optimizing compiler or 
optimization. However, previously defined measures of precision for pointer anal- 
ysis that are independent of a particular optimization, such as average points-to 
set size and number of singleton points-to sets, provide little indication of the 
ability of a pointer analysis to enable optimizations by identifying memory ac- 
cesses as not aliased. 

Therefore, we propose “alias frequency”, a new metric that estimates the 
precision of alias information produced by a given pointer analysis. 

4.1 Simple Alias Frequency 

For a given program, we define queries to be a set of alias queries. Each query 
(61,62) involves a pair of memory access expressions occuring statically in the 
program. The alias frequency of a given pointer analysis is the percentage of 
queries for which the analysis says that 61 and 62 may refer to the same memory 
location in some execution of the program: 

simple alias frequency = — x 100 

f e2)G<]ueries 
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0(61,62) = 



if ei , 62 may be aliases 
otherwise 



Alias Queries. An extreme approach to generating alias queries would be to 
consider all pairs of memory accesses encountered anywhere in the program. This 
would result in a large number of pairs of accesses from different functions. Most 
of these pairs are uninteresting, because a typical optimizer will not optimize 
code across function boundaries. Therefore, we consider only alias queries where 
both memory access expressions occur in the body of the same function (there 
may be duplicate pairs) . We believe these queries represent most intra-procedural 
optimizations performed in commonly used C compilers! 

Some expressions contain multiple dereference operators. In order to limit 
the number of queries, we consider only top-level memory accesses from assign- 
ment expressions, conditional expressions, and function argument:! We have 
experimented with different criteria for selecting queries (such as including sub- 
expressions of nested dereferences, and ignoring function arguments), and have 
found that our results remain consistent. 

Categorizing Queries. We categorize memory accesses based on whether they 
require pointer information to resolve. We define a “symbol-access” recursively: 
a symbol-access is a variable, a field access operation on a symbol-access, or 
an array index operation on a symbol-access of array type. Every remaining 
memory access, including a dereference, an arrow operation, or an array index 
operation on an object of pointer type, is a “pointer-access” . Every alias query 
relating two symbol-accesses can be answered without pointer analysis. If the 
two symbol-accesses refer to the same variable, we say they may be aliases: 



a(si, S2) = 



if var(si) = var{s2) 
otherwise 



Measuring a Given Pointer Analysis. Given a pointer analysis that produces 
a points-to set pts(e) for every expression e (we set pts{e) = {var{e)} for a 
symbol-access e), we can answer queries involving pointer-accesses. Two accesses 
may be aliases if and only if their points-to sets overlap: 

, , J 1 if pts{ei)npts{ez) 

^ 0 otherwise 



Measuring the Best and Worst Possible Pointer Analysis. We are es- 
pecially interested in understanding the gap in alias precision between scalable 
pointer analyses and more precise algorithms. Therefore, we create an artificial 
lower bound analysis that under-estimates the alias frequency of the best possible 
safe pointer analysis, by treating every query involving at least one pointer-access 

^ Notice that these queries would include aliases between globals and locals referenced 
in the body of the same function. 

^ Given = x, we consider and x, but not 
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as not aliased. The only exception is when GOLF determines that a pair of ac- 
cesses refer to the same stack or global symbol. The lower bound analysis 

treats these pairs as aliases^ 



1(61,62) = 



if pts{ei) = pts{e2) = {v} 
otherwise 



The lower bound analysis has the property that it is at least as precise as the 
best possible pointer analysis on every alias query. Therefore, if a given pointer 
analysis is close in alias frequency to the lower bound analysis, it must be at 
least as close to any more precise safe pointer analysis. 

We also create an artificial upper bound analysis, by treating every query in- 
volving at least one pointer-access as aliases. The upper bound analysis indicates 
whether any form of pointer analysis is necessary for a given program. 

Our metric over-estimates the alias frequency of the lower bound analysis, 
because a pair of accesses of the same variable may not be aliases if the accesses 
refer to different structure fields. However, we are concerned with the difference 
between a given analysis and the lower bound. Consider pairs of accesses where 
at least one access is a pointer-access. The lower bound analysis treats such pairs 
as not aliased, whereas any of our pointer analyses could potentially improve its 
response to these queries using field distinction. For pairs of symbol-accesses, 
all of the analyses, including the lower bound analysis, suffer equally. Therefore, 
our lack of field distinction leads us to conservatively over-estimate the precision 
gap between a given pointer analysis and the lower bound analysisj 



4.2 Weighted Alias Frequency 

As mentioned in the introduction, we would like to estimate the potential impact 
on run-time of the alias queries on which a pointer analysis produces a possibly 
inaccurate response. Therefore, we weight the response 0(61,62) of any analysis 
to every query by the sum of the dynamic execution counts {num{ei ) -I- num{e2), 
gathered from profile data) of the accesses in the query: 



weighted alias frequency = 






<^(ei »e2) X (num(ei )+n-um(e2 )) 

e2)equeTies ^ 100 



Z^(ei 



£2 ) £ queries 



num{ei )+mim(es) 



A small difference between the weighted alias frequency of a given pointer 
analysis and the lower bound analysis means that a more precise pointer analysis 
is unlikely to enable additional optimizations that improve run-time significantly. 

® We create dummy symbols, one at each dynamic allocation site, to represent heap 
storage. The lower bound analysis does not treat accesses of the same heap symbol 
as aliases, whereas our pointer analyses do. 

^ The same argument applies to accesses of static arrays. 
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Table 1. Benchmark data. For each program, the table above shows the lines 
of code, the AST node count, and the running-time (in seconds) of GOLF. 



Program 


LOG 


AST nodes 


Time (s) 


compress 


1,904 


2,234 


0.05 


li 


7,602 


23,379 


0.50 


mSSksim 


19,412 


65,967 


0.88 


ijpeg 


31,215 


79,486 


1.13 


go 


29,919 


109,134 


0.98 


perl 


26,871 


116,490 


1.53 


vortex 


67,211 


200,107 


4.09 


gcc 


205,406 


604,100 


7.17 


Word97 


2,150,793 


5,961,129 


133.66 



5 Experiments 



We have produced a modular implementation of GOLF using the AST Toolkit, 
which is itself an extension of the Microsoft Visual G-l— I- compiler. Our imple- 
mentation handles all of the features of G. Details may be found in We 

implemented GOLF by modifying the rules for parameter passing and return 
statements in our implementation of Das’s algorithm, and by adding a GFL- 
Reachability engine. Our implementation of Das’s algorithm has been tested 
extensively. Apart from all of the usual testing, we verified the correctness of our 
implementation of GOLF in two ways. First, we performed reachability queries 
forward and backward, with and without memoization, and verified that we get 
the same results in every case. Second, we tested our implementation of GFL- 
Reachability by treating labeled edges as unlabeled and verifying that we obtain 
the same points-to sets as with Das’s algorithm. 

Benchmark Programs. Table ^shows our benchmark programs, consisting 
of the integer benchmarks from SPEG95, and a version of Microsoft Word. For 
each benchmark, we list the total lines of source code (including comments and 
blank lines), as well as the number of AST nodes (a more accurate measure of 
program size), and the analysis time (in seconds) for GOLF, averaged over 5 
runs. Analysis time includes time to analyze each compilation unit (excluding 
parse time), time to write out object files, time to read in all of the object files, 
perform unifications, and compute points-to sets exhaustively at all static deref- 
erence points in the program using GFL-Reachability. All of our experiments 
were conducted on a Dell 610 desktop PG running Windows 2000, with 512MB 
RAM and a single SOOMhz Intel Pentium III processor. 



5.1 Alias Precision of Pointer Analysis 

TableHshows the simple and weighted alias frequencies of various pointer anal- 
yses. We obtained execution counts for computation of weighted alias frequency 
by instrumenting the benchmarks and running them on their SPEG reference 
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Table 2. Precision of various pointer analyses. For each benchmark program, 
the table above shows the simple alias frequency of the lower bound analysis 
(Lower), GOLF, Das’s algorithm (DasOO), Steensgaard’s algorithm (Ste96), and 
the upper bound analysis (Upper), and the difference in simple alias frequency 
between DasOO and Lower (Diff). The same data is also shown for weighted alias 
frequency. We were not able to obtain dynamic execution counts for Word97. 



Program 


Simple Alias frequency 


Weighted Alias frequency 


Lower 


GOLF 


DasOO 


Ste96 


Upper 


Diff 


Lower 


GOLF 


DasOO 


Ste96 


Upper 


Diff 


compress 


13.8 


14.02 


14.02 


14.13 


32.38 


0.22 


9.93 


9.93 


9.93 


9.93 


28.82 


0.0 


ii 


10.17 


18.84 


18.84 


19.53 


42.27 


8.67 


13.10 


22.98 


22.98 


22.99 


62.12 


9.88 


mS8ksim 


14.97 


mj 


mi 


20.44 


40.5 


2.03 


11.53 


13.77 


13.77 


18.67 


37.77 


2.24 


ijpeg 


5.93 


17.9 


17.9 


19.14 


61.49 


11.97 


5.55 


16.31 


16.31 


16.31 


57.42 


10.76 


go 


7.85 


7.87 


7.87 


7.87 


8.35 


0.02 


9.5 


9.73 


9.73 


9.73 


15.53 


0.23 


perl 


9.54 


14.45 


14.45 


14.53 


45.17 


4.91 


3.45 


12.56 


12.56 


12.56 


53.87 


9.11 


vortex 


6.12 


10.81 


10.81 


15.71 


42.69 


4.69 


3.7 


7.18 


7.18 


14.51 


50.20 


3.48 


gcc 


5.49 


11.98 


11.98 


14.64 


50.36 


6.49 


4.62 


9.36 


9.36 


10.72 


51.66 


4.74 


Word97 


6.63 


14.45 


15.07 


20.37 


44.21 


8.44 


- 


- 


- 


- 


- 


- 


Average 


8.94 


14.15 


14.22 


16.26 


40.82 


5.27 


7.67 


12.73 


12.73 


14.43 


44.67 


5.06 



inputs|The data shows that all of the scalable pointer analyses are surprisingly 
close to the lower bound analysis. Das’s algorithm does as well as the lower 
bound analysis on all but 5.2% of the alias queries, on our benchmark programs. 

To better understand the loss in precision from scalable pointer analysis, we 
manually examined a fraction of the queries on which Das’s algorithm differs 
from the lower bound analysis. We found that in almost every case, either the 
lower bound analysis is unsound, or we could have used straightforward field 
distinction to resolve the query as not aliased. Therefore, we believe that the 
gap in alias frequency between scalable pointer analyses and the best possible 
pointer analysis is in fact much less than 5%. 

The data in Table^also shows that the difference in weighted alias frequency 
between Das’s algorithm and the lower bound analysis is very similar to the 
difference in simple alias frequency, for every benchmark. We therefore claim 
that the queries on which Das’s algorithm is inaccurate are not likely to provide 
significant additional opportunity for optimization. 

On all of our benchmarks, the differences in weighted alias frequencies be- 
tween various analyses are very similar to the differences in simple alias fre- 
quency between the same analyses. Hence, we argue that simple alias frequency 
is a useful indicator of precision for implementors of pointer analysis who do not 
have access to either profile data or optimizing compilers that can consume alias 
information produced by their analyses. 

DasOO vs GOLF. Table B^hows that the alias frequency of Das’s algorithm 
is not improved by the addition of context-sensitivity for any benchmark other 
than Word97. This data shows that in practice, scalable pointer analyses do not 
sacrifice optimization opportunity because of a lack of context-sensitivity. 



® The reference input for gcc consists of 56 C source files. We ran gcc on the five 
largest source files and averaged the execution counts. 
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Ste96. The data also shows that Steensgaard’s algorithm is surprisingly close to 
the lower bound analysis, given the relatively poor precision of the algorithm in 
terms of points-to set size (see Tabled- We believe that this is largely because 
the pollution of points-to sets that occurs in Steensgaard’s algorithm leads to 
accumulation of variables across functions, but this pollution does not result in 
conservative alias relationships between pointer variables from the same function. 
Also, smaller points-to sets do not imply lower alias frequency, if the smaller sets 
contain the same subset of common symbols. Finally, points-to set sizes are often 
artificially inflated by the inclusion of symbols that are out of scope. Tableland 
TableHclearly show that traditional measures of precision for pointer analysis 
do not reflect the ability of the analysis to produce good alias information. 
DasOO vs Andersen. Previous work has shown that Das’s algorithm 

and Andersen’s algorithm produce almost identical points-to sets. There- 

fore, their alias frequencies can be expected to be almost identical as well. 
Limitations. First, although we measure alias frequency, we are really eval- 
uating pointer analysis. It may be possible to significantly improve the alias 
frequency of any analysis, including the lower bound analysis, by adding a struc- 
ture held analysis and/or an array index analysis. Second, our results apply only 
to C programs; whether they apply to programs written in C-|— I- or Java is 
an open question. Third, aggressive inter-p rocedur al optimizers may be able to 
utilize alias information inter-procedurally These opportunities are not 

reflected in our selection of alias queries. 

One potential concern with our results may be that scalable pointer analyses 
appear close to the lower bound analysis because we have swamped the query set 
with pairs of symbol accesses. We And that on the average, 30% of the queries 
require some form of pointer information. This large percentage indicates that 
there is a need for at least some form of pointer analysis in compilers. 

As might be expected, we generate a large number of alias queries (10 million 
queries for Word97). Each query may require reachability on the points-to graph. 
By using the amortization technique described in Section ^3 we are able to 
answer alias queries extremely efficiently. We can answer all queries for Word97 
in less than seven minutes. 

There may be regions of a program where a more accurate analysis may 
eliminate aliases. For instance, consider a linked list traversal using previous and 
current pointers. Our results show that a useful approach may be to first run a 
scalable pointer analysis, and then apply a more precise shape analysis locally, 
on a few functions. These functions can be identified using alias frequency. 



5.2 Performance and Precision of GOLF 

Performance. In the figure below, we chart the running times for GOLF from 
TableH We use the ratio of running-time to program size. The chart shows that 
this ratio is fairly steady as program size grows, indicating that the analysis 
scales linearly with program size. GOLF requires roughly twice as much time 
and as much memory as Das’s algorithm. We do not present detailed data on 
space consumption, which is very low. GOLF requires 20MB for Word97: 
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Benchmark 



Precision vs Other Scalable Pointer Analyses. XableHshows the precision 
of GOLF measured using traditional metrics. The table shows the average size 
of points-to sets at dereference points and the number of singleton points-to sets 
for each benchmark. Following previous work, the size of the points-to set at a 
dereference point is the number of program symbols, including dummy symbols 
produced at dynamic allocation sites, in the points-to set of the dereference 
expression. All three analyses were run with the same settings, using the same 
implementation. We omit data for points-to sets at indirect call sites, although 
GOLF can also be used to improve points-to sets for function pointers as well. 

Points-to sets with single elements represent opportunites for replacing con- 
ditional updates with strong upda tes. Sm aller points-to sets may lead to greater 
efficiency in subsequent analyses^^^ffl|, as well as less run-time overhead in 
systems that instrument code The data shows that GOLF produces 

more singleton sets than Das’s algorithm for several benchmarks. On t he whol e, 
our results appear to be consistent with the results of Foster et al 
who found little improvement in precision from the addition of polymorphism 
to a poi nter ana lysis with subtyping. Our benchmark programs are much larger 
than in and we do see greater improvement on larger programs. 

Precision vs Full Polymorphic Subtyping. GOLF approximates a full poly- 
morphic subtyping algorithm by restricting both subtyping and polymorphism 
to one level in the type structure. Das has already sho wn that the one level 
restriction of subtyping does not cause loss in precision The data for 

the FRDOO and OneLev columns in Table^shows that the one level restriction 
of polymorphism does not cause precision loss either. 

It is therefore likely that GOLF extracts most of the precision of an analysis 
with full polymorphism and subtyping. However, it is still possible that the com- 
bination of full polymorphism and full subtyping may eliminate more spurious 
flow of values than the combination of limited polymorphism and limited sub- 
typing used in GOLF. We were unable to perform a direct comparison, because 
it has not been possible to scale polymorphic subtyping to large programs. 
Coutext-Seusitive Clieuts. In order to populate points-to sets, we accumulate 
all flow of pointer values into a function from its callers. In the example below, 
the points-to set of 1 is the same using Das’s algorithm or GOLF: 



void Read(0bj ol, Dbj o2) { LockWrap(&ol.lock); LockWrap(&o2.1ock); } 
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Table 3. Precision of various pointer analyses. For each benchmark program, 
the table above shows the average size of points-to sets at static dereference 
points for Ste96, DasOO, GOLF, a polymorphic version of Steensgaard’s algo- 
rithm (FRDOO), and a one level restriction of FRDOO (OneLev). The table also 
shows the number of dereference points with singleton points-to sets found us- 
ing each of these algorithms. FRDOO and OneLev cannot be compared directly 
with the other analyses, because they are based on Rehof ’s implementation of a 
polymorphic version of Steensgaard’s algorithm We were not able to 

use this implementation to analyze Word97. 



Program 


Average thru-deret size 


Singleton sets 


yte9t) 


DasOO 


Golf 


4'RDOO 


UneLev 


Ste96 


DasOO 


Golf 


4'RDOO 


UneLev 


compress 


2.1 


1.22 


1.22 


2.9 


2.9 


36 


47 


47 


30 


30 


li 


287.7 


185.62 


185.62 


189.63 


194.80 


15 


39 


39 


15 


15 


m88ksim 


86.3 


3.29 


337 


HTTs 


15.16 


116 


638 


6ir 


256 


251 


ijpeg 


mj 


13 m 


11.78 


13.01 


H3o 


1,671 


3,287 


3,287 


1,802 


77777 


go 


45.2 


n779 


14.79 


16.06 


16.06 


28 


28 


28 


23 


23 


perl 


36.1 


2231 


21.90 


23.89 


23.91 


2i0 


1,023 


1,155 


307 


306 


vortex 


1,064.5 


59.86 


59.30 


57.42 


6530 


808 


4,855 


4,855 


4,764 


¥7761 


gcc 




736 


73T 


90.62 


9737 


1,323 


6,830 


6,896 


2,637 


2,598 


Word97 


27,176.3 


11,219.5 


7,756.6 


- 


- 


11,577 


41,904 


43,142 


- 


- 



void LockWrap(Lock * l) { AcquireLock(l); } 

However, the labeled edges in GOLF can be used to produce distinct sum- 
maries of function behaviour at different call sites. These summaries can be 
leveraged by a client of GOLF, as long as the client is context-sensitive. For 
instance, a context-sensitive analysis that tracks lockable objects can use the 
summaries of LockWrap produced by GOLF to conclude that ol must be locked 
by the first call to LockWrap. Das’s algorithm can only say that ol may be locked 
by either call. We believe that this is the real value of GOLF. 

6 Related Work 

GOLF. As we mentioned in the introduction, our work on GOLF follows a long 
line of research on context-sensitive pointer analysis. The most precise algorithms 
are control-flow-sensitive and context-sensitive It 

is not clear whether any of these algorithms will scale beyond 50,000 lines of 
code. Previous algorithms for control-flow-insensitive context-sensitive pointer 
analysis include The first two algorithms follow every edge 

in the call graph, whether the call graph is pre-computed or constructed on the 
fly. This may limit their applicability to large programs, which have very large 
(quadratic sized) call graphs due to indirect calls. On the other hand, 
appears to scale well, but it does not provide any degree of subtyping, which is 
important for larger programs. 

GOLF is a context-sensitive algorithm with subtyping that scales to large 
programs. It is an extension of Das’s algorithm We apply his one level 

flow idea to restrict polymorphism without losing precision, and we borrow his 
caching technique to speed up our flow computation. GOLF can also be viewed 
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as a restriction of Rehof and Fahndrich’s general polymorphic subtyping frame- 
work in With some modifications to account for unification, globals, and 

pointers, GOLF can be viewed as a variant of Reps, Horwitz and Sagiv’s frame- 
work in GOLF is a scalable instantiation of these two frameworks. 

Liang and Harrold have described a mechanism for extracting some context- 
sensitivity from context-insensitive pointer analyses . We believe that their 

approach could be used to add context-sensitivity to Das’s algorithm. It is not 
clear h ow GOL F would compa re with t he resulting analysis. 

Ruf and Foster et al have reported empirical investigations of 

the added precision provided by context-sensitive pointer analysis. Both argue 
that there is little gain in precision from context-sensitivity. Our results are 
consistent with theirs, and extend their conclusions to much larger programs. 
However, we believe that the real value of GOLF is as a context-sensitive value 
flow analysis that produces polymorphic summaries of function behaviour. 
Impact of Flow-Insensitive Pointer Analysis. The issue we have addressed 
in this paper is the usefulness of control-flow-insensitive pointer analyses in com- 
piler optimizations. Although conventional wisdom says that the lack of flow- 
sensitivity and structure-field distinction can severely limit the usefulness of 
scalable pointer analyses, there is no empirical evidence to support this be- 
lief. In fact, several studies have produced results that contradict this idea 
Gheng and Hwu have shown that a context-sensitive 
pointer analysis with subtyping can enable many optimizations in a compiler 
Their result inspired us to develop a scalable context-sensitive pointer 
analysis with subtyping. Hind and Pioli have shown that flow-sensitivity has 



little impact on the precision of pointer analysis 



Diwan et al have shown 



that for a particular Java optimization, a flow-insensitive pointer analysis pro- 
vides all of the precision that can be exploited by an optimizer Our 

results are consistent with all of these studies. 

We know of no previous work that uses alias frequency to estimate the impact 
of pointer analysis on compiler optimizations. Diwan et aZhave studied the effect 
of pointer analysis on a particular Java optimization at several levels, including 
static points-to information, optimization opportunities enabled, and run-time 
improvement Ideally, we would like to repeat their study for every 

conceivable optimization and every pointer analysis. We propose weighted alias 
frequency as a practical replacement for such a large set of experimental studies. 

One avenue for further improvement in precision that is suggested by our 
results is to run a scalable analysis globally, and apply more precise analysis 



locally. Rountev et al have proposed this idea in Our results provide 

evidence that supports their approach. They use Steensgaard’s algorithm as the 
scalable global analysis. We believe that using GOLF as the global analysis would 
lead to greater precision. Also, our alias frequency measure can be used in their 
framework, to identify target functions for more precise analysis. 



7 Conclusions 

In this paper, we have provided experimental evidence to support the claim that 
scalable pointer analyses provide precise alias information for G programs. We 
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believe this is a strong argument for the routine use of scalable pointer analysis 
in optimizing compilers. We have also developed a framework for measuring 
the impact of pointer analysis on compiler optimizations in a manner that is 
independent of a particular optimization or optimizing compiler. Finally, we have 
presented GOLF, the first algorithm that can trace the flow of values in very 
large C programs, while providing a degree of subtyping and context-sensitivity. 
We believe that the most useful method for analysis of large programs may be 
to use a scalable global analysis in conjunction with an expensive local analysis. 
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Abstract. This paper presents a modular algorithm that efficiently 
computes parameterized pointer information, in which symbolic names 
are introduced to identify memory locations whose addresses may be 
passed into a procedure. Parameterized pointer information can be used 
by a client program analysis to compute parameterized summary infor- 
mation for a procedure. The client can then instantiate such information 
at each specific callsite by binding the symbolic names. Compared to 
non-parameterized pointer information, in which memory locations are 
identified using the same name throughout a program, parameterized 
pointer information lets the client reduce the spurious information that 
is propagated across procedure boundaries. Such reduction will improve 
not only the precision, but also the efficiency of the client. The paper also 
presents a set of empirical studies. The studies show that (1) the algo- 
rithm is efficient; and (2) using parameterized pointer information may 
significantly improve the precision and efficiency of program analyses. 



1 Introduction 



Various pointer analyses have been developed to facilitate program analyses of C 
programs. To support these program analyses, a pointer analysis must associate 
names with memory locations. A pointer analysis must also provide information 
that determines the memory locations accessed through pointer dereferences. 
With this information, a program analysis can first replace the pointer derefer- 
ences in a program with the memory locations accessed through such derefer- 
ences, and then analyze the program in the usual way prrnj . 

Pointer analysis algorithms can differ in the way in which they assign names 
to memory locations. Such differences can significantly impact the precision and 
the efficiency of the program analyses that use the pointer information. Many 
existing pointer analysis algorithms (e.g., j1 1311161811 1 11 411 511 811 h| 1 use the same 
name to identify a memory location throughout the program. Because a memory 
location may be accessed throughout the program, its name can appear in several 
procedures. Therefore, a program analysis that uses this pointer information 
usually treats such a name as if it were a global variable name. 

A few existing pointer analysis algorithms HED assign different names to a 
memory location in different procedures. When the address of a memory location 
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(a) Program 1 

1 int x; 

2 foo(int *p) { 

3 *p=*p+l; 
4} 



5 main() { 

6 int y; 

7 foo(&x); 

8 foo(&y); 

9} 



(b) pp 


(c) Op 

U 








"nv" bound to "x" at 7 


Ox.y 


(Jnv 


"nv" bound to "y" at 8 



Fig. 1. Program 1 (a), non-parameterized (b) and parameterized (c) points-to 
graph. 



can be passed into a procedure from a callsite, these algorithms use a symbolic 
name to identify the memory location within the procedure. If the pointer infor- 
mation computed for the procedure is used under different calling contexts, the 
symbolic name can be used to identify different memory locations. For example, 
symbolic name nv can be used to identify the memory locations whose addresses 
are passed into fooO (Figure 0^a)) through p. Under the context of statement 
7, nv identifies x. Under the context of statement 8, nv identifies y. The symbolic 
names used by the algorithms act like reference parameters. Thus, we refer to 
such symbolic names as auxiliary parameter^ and refer to pointer information 
containing auxiliary parameters as parameterized pointer information. 

For supporting program analyses, parameterized pointer information has sev- 
eral advantages over non-parameterized pointer information. First, parameter- 
ized pointer information can be used by a program analysis to compute param- 
eterized summary information for a procedure. Such parameterized summary 
information can be instantiated at a callsite to compute more accurate informa- 
tion about the callsite. For example, using parameterized pointer information, 
a program analysis reports that nv is modified by fooO (Figure ^). The pro- 
gram analysis then instantiates this information at statement 7 by replacing nv 
with x, and reports that x is modified by fooO at statement 7. In contrast, 
using non-parameterized pointer information, the program analysis reports that 
both x and y may be modified by foo(). The program analysis then uses this 
information at statement 7 and reports that x and y may be modified by fooO 
at statement 7. Second, parameterized pointer information for a procedure is 
more compact than non-parameterized pointer information: in the parameter- 
ized pointer information for a procedure, an auxiliary parameter can be used 
to represent a set of memory locations that require several names to identify in 
non-parameterized pointer information. Thus, a program analysis creates and 
propagates less information when using parameterized pointer information. 

The major problem with acceptance of existing algorithms that compute 
parameterized pointer information is that they are not efficient for analyzing 
large programs. One reason for this inefficiency is that existing algorithms use 
a flow-sensitive approach, which may not scale to large programs rmm . a 
second reason for this inefficiency is that existing algorithms may analyze a 
procedure more than once |7irrj . This additional analysis increases the expense 
of the algorithms. Another problem with acceptance of existing algorithms that 
compute parameterized pointer information is that none of these algorithms have 
been compared empirically with algorithms that compute non-parameterized 
pointer information. Thus, it is unknown how much improvement in precision 

^ Auxiliary parameters are similar to symbolic names |Z| or extended parameters m- 
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and performance can be gained by a program analysis that uses parameterized 
pointer information instead of non-parameterized pointer information. 

This paper presents a mod ular parameterized pointer analysis algorithm 
(MoPPA) that efficiently computes points-to graphs for a program. MoPPA fol- 
lows a three-phase flow-insensitive, context-sensitive pointer analysis framework. 
MoPPA uses, when possible, auxiliary parameters to identify memory locations 
whose addresses are passed into a procedure. MoPPA also distinguishes the mem- 
ory locations that are dynamically allocated in a procedure when the procedure 
is invoked under different calling contexts. 

Compared to other algorithms (e.g., [Hti8ll5iTn| 'l that are intended to han- 
dle large programs, a major benefit of MoPPA is that it provides parameterized 
pointer information. Another benefit of MoPPA over these algorithms is that 
MoPPA can distinguish the memory locations dynamically allocated in a pro- 
cedure under different calling contexts. Therefore, MoPPA may provide more 
precise pointer information than these algorithms. Compared to existing al- 
gorithms that compute parameterized pointer information, a major benefit of 
MoPPA is its efficiency. MoPPA processes each pointer assignment only once. 
By storing global pointer information in one global points-to graph, MoPPA 
propagates, from one procedure to another, only a small amount of information 
related to parameters. Therefore, MoPPA can efficiently compute the points-to 
graphs. Another benefit of MoPPA is its modularity — only the information for 
the procedures within a strongly connected component of the call graph must 
be in memory simultaneously. Thus, MoPPA may require less memory. 

This paper also presents a set of empirical studies. These studies show that, 
on subjects of up to 100,000 lines of code, 

— MoPPA runs in time close to that required by PICS; such time is close to that 
required by Steensgaard ’s m, the most efficient flow-insensitive algorithm. 

— Using information provided by MoPPA instead of that provided by FICS 
or Andersen’s algorithm, (a) a program analysis computes on average 12% 
(maximum 37.9%) fewer flow dependences for a statement in a procedure; (b) 
a program analysis runs on average 10 (maximum 210 over FICS, maximum 
445 over Andersen’s) times faster, and computes on average 25% (maximum 
57%) fewer transitive interprocedural flow dependences for a statement in 
a program; and (c) a program sheer runs on average 7 (maximum 72 over 
FICS, maximum 106 over Andersen’s) times faster, and computes on average 
12% (maximum 45%) smaller slices. 

The studies show that (1) MoPPA is efficient and (2) using parameterized pointer 
information provided by MoPPA can significantly improve the precision and the 
efficiency of many program analyses. 

The significance of this work is that it provides the first algorithm that ef- 
ficiently (within one minute) computes parameterized pointer information for 
programs up to 100,000 lines of code. The work also presents, to the best of our 
knowledge, the first set of empirical studies that compare the results of program 
analyses computed using parameterized pointer information with that computed 
using non-parameterized pointer information. Our studies show that computing 
parameterized pointer information for large programs is feasible and beneficial. 
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2 Parameterized Points-to Graphs 

This section discusses the points-to graphs constructed by MoPPA and the ap- 
proach used by MoPPA to assign names to memory locations. 



2.1 Points-to Graphs 

MoPPA uses points-to graphs to represent pointer information. In a points-to 
graph, a node represents a set of memory locations whose names are associated 
with the node. A field access edge, labeled with a field name, connects a node 
representing structures to a node representing a field of the structures. A points- 
to edge, labeled with represents points-to relations. For example, the edge 
in Figure ^b) represents that p points to x or y. For efficiency, MoPPA imposes 
two constraints on a points-to graph: (1) a memory location can be represented 
by only one node; (2) labels are unique among the edges leaving a node0 

MoPPA computes two kinds of points-to graphs. For a program, MoPPA 
computes a global points-to graph that represents the pointer information re- 
lated to global pointers. For each procedure in the program, MoPPA computes 
a procedural points-to graph that represents the pointer information related to 
the local pointers in the procedure. The separation of global pointer information 
from local pointer information lets MoPPA reduce the amount of information 
that it propagates across procedure boundaries. For example, suppose that at 
the beginning of mainO , global pointer g is forced to point to x. By making this 
information available in the global points-to graph, MoPPA avoids propagating 
such information to procedures in which the computation of the pointer informa- 
tion in the procedures does not involve g. When analyzing a program, a program 
analysis resolves dereferences of global pointers using the global points-to graph. 



2.2 Naming Memory Locations 

MoPPA identifies memory locations in a procedure using three kinds of names: 
auxiliary parameter, local, and quasi-global. MoPPA uses an auxiliary parame- 
ter, when possible, to identify, in procedure P, a memory location whose address 
may be passed into P through formal parameters. An auxiliary parameter, as 
defined in the Introduction, can identify different memory locations under differ- 
ent calling contexts. To support program analyses, MoPPA also provides binding 
information that maps an auxiliary parameter in P to the names that identify 
the same memory locations at a callsite to P. For example, MoPPA uses aux- 
iliary parameter nv to identify memory locations for x and y in the points-to 
graph (FigureE^c)) for fooO (Figure Q]( a)). MoPPA also provides information 
to map nv to x at statement 7 and to y at statement 8. 

MoPPA uses a local name to identify a memory location that cannot be 
accessed after the procedure returns. A local name is a name whose scope includes 
only one procedure. For example, the memory location for a local variable in 
procedure P may be identified using a local name whose scope includes only P. 



2 



Similar constraints are also used in Steensgaard’s algorithm |1 and FIGS |1 .‘i| . 
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MoPPA may use a quasi-global name to identify, in a procedure P, the mem- 
ory location for a global variable or a memory location whose address can be 
passed into P through global pointers. A quasi-global name is a name whose 
scope may include several procedures, but may not include all procedures in a 
program. The scope of a quasi-global name ensures that, if a memory location 
loc is identified using a quasi-global name N in P, then loc is also identified 
using N in P’s callers. Therefore, MoPPA need not propagate the pointer in- 
formation for loc from P to its callers because such information is stored in the 
global points-to graph and can be retrieved, using the same name, when the 
information is needed in P’s callers. At most one quasi-global name can be used 
to identify a memory location in different procedures throughout the program. 

Using quasi-global names to improve efficiency is one of the features that 
distinguish MoPPA from Wilson and Lam’s algorithm ^1]. In their algorithm, 
all memory locations, including global variables and those that are accessed 
through global pointers, are identified using extended parameters (similar to 
auxiliary parameters) in each procedure. Preliminary studies show that many 
large programs may use a large number of global pointers. For such programs, 
propagating information for all global pointers from procedure to procedure 
may be prohibitively expensive. The studies also show that the values of global 
pointers do not change often in a program. Therefore, introducing symbolic 
names to represent the memory locations that are accessed through dereferences 
of global pointers in each procedure might be unnecessary. 

MoPPA uses various rules to determine whether to use an auxiliary param- 
eter, a local name, or a quasi-global name to identify global memory locations 
(global variables), stack-allocated memory locations (local variables), or heap- 
allocated memory locations in a procedure. For each global variable g accessed 
within a procedure P, MoPPA determines whether g is accessed only using its 
address that is passed into P through formal parameters. If this is the case, 
MoPPA uses an auxiliary parameter to identify g in P. For example, MoPPA 
uses auxiliary parameter nv to identify x in fooO in Figure Da). However, if 
g is accessed using its variable name or using an address that is passed into P 
through global pointers, then MoPPA uses g’s variable name as the quasi-global 
name to identify g in P (e.g., x in mainO in Figure Q(a)). This quasi-global 
name is also used to identify g in P’s direct or indirect callers. 

For a local variable I that is declared in P, if I cannot be accessed through 
dereferences of global pointers in the program, MoPPA uses I’s variable name as 
a local name to identify I in P. In any other procedure where I may be accessed, 
MoPPA uses an auxiliary parameter to identify 1. However, if I can be accessed 
through dereferences of global pointers in the program, MoPPA uses a quasi- 
global name to identify I in the procedures into which I’s address may be passed 
through global pointers. MoPPA also identifies I using this quasi-global name in 
the callers to these procedures. In the procedures into which I’s address is passed 
only through formal parameters or dereferences of formal parameters, MoPPA 
uses an auxiliary parameter to identify 1. 

Identifying local variables with quasi-global names might cause imprecision 
in the pointer analysis. A local variable I declared in procedure P can be accessed 
only in P or in the procedures that P may directly or indirectly call. However, if 
MoPPA uses a quasi-global name N to identify I in the program, then according 
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to the way in which N’s scope is determined, N may appear in procedures 
that P may never (directly or indirectly) call. Therefore, a program analysis 
may conclude that I may be accessed in these procedures and compute spurious 
information. This kind of imprecision can also be introduced by many other 
existing algorithms (e.g. mm)- One way to reduce such imprecision is to 
remove, from TV’s scope, the procedures that P may never (directly or indirectly) 
call. However, MoPPA does not include this optimization because preliminary 
studies show that few local variables may be pointed to by global pointers. 

MoPPA attempts to distinguish the memory locations allocated on the heap 
in a procedure P under different calling contexts. Unlike algorithms (e.g., j^) 
that distinguish these memory locations by extending their names with call 
strings, MoPPA makes such distinction only if the distinction may improve the 
precision of program analyses. Suppose that a statement s in P allocates memory 
location loc on the heap. If loc’s address is not returned to P’s callers, then 
loc can be accessed only within P. MoPPA identifies loc using an auxiliary 
local name whose scope includes only P. If loc’s address may be returned to 
P’s callers through the return value or dereferences of formal parameters, but 
not through global pointers or dereferences of global pointers, MoPPA uses an 
auxiliary parameter to identify loc in P. MoPPA also creates names, using similar 
rules, to identify loc in P’s callers. Because different names may be created to 
identify the memory locations returned by P at different callsites, MoPPA can 
distinguish, in P’s callers, the memory locations allocated at s under different 
calling contexts. However, if loc’s address may be returned to P’s callers through 
global pointers or dereferences of global pointers, MoPPA introduces a quasi- 
global name to identify loc in P and in all callers of P. Therefore, MoPPA does 
not distinguish memory locations allocated at s under different calling contexts. 

For example, let loc be the memory location allocated at statement 14 in Fig- 
ure |2Ka). Because loc is returned to alloc O’s callers only through *f, MoPPA 
uses auxiliary parameter nv2 to identify loc in Gaiioc() (Figure0(d)), the points- 
to graph for alloc (). When loc is returned to getg() at statement 10, MoPPA 
identifies loc using a quasi-global name gh because loc may be returned to 
getgO’s callers also through global pointer g. When loc is returned to mainO 
at statement 3, MoPPA identifies loc using a local name Ih because loc cannot 
be returned to mainO’s callers. Compared to the points-to graphs (Figure EKe)) 
constructed by FIGS, MoPPA computes more precise pointer information. 

In summary, to identify memory locations in the procedures with appropriate 
names, MoPPA first determines the scope of each quasi-global name N : 

— Rule 1. If is the variable name of a global variable and N syntactically 
appears in a procedure P, then N’s scope includes P. 

— Rule 2. If N identifies a memory location pointed to by another memory 
location identified by quasi-global name Ni according to the global points-to 
graph, and A^i’s scope includes P, then N’s scope includes P. 

— Rule 3. If N’s scope includes a procedure P, then N’s scope includes all 
the procedures that call P. 

MoPPA then determines, for a memory location loc accessed in procedure P, 
whether there is a quasi-global name for loc whose scope includes P. If so, 
MoPPA uses this name to identify loc in P. 
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Otherwise, MoPPA determines whether loc can be accessed before P is in- 
voked or after P returns. If that is the case, MoPPA uses an auxiliary parameter 
to identify loc in P. Otherwise, MoPPA uses a local name to identify loc in P. 

3 Computation of Parameterized Points-to Graphs 

This section introduces some definitions and gives an overview of MoPPA. 

3.1 Definitions 

Memory locations in a program are accessed through object names, each of which 
consists of a variable and a possibly empty sequence of dereferences and field 
accesses M- 

Object name is extended from object name N 2 if N\ can be constructed 
by applying a possibly empty sequence of dereferences and field accesses uj to 
N^', we denote N\ as £u}{N 2 ). 

For example, if pointer p points to a struct with field a in a C program, then 
£*{p) is *p and £^.,a{p) is {*p).a. 

Given an object name N of pointer type, the points-to node of Af in a points- 
to graph G is the node 

that represents the memory locations that may be pointed to by N. To find 
the points-to node for N in G, an algorithm first locates or creates, in G, a node 
no that represents the variable in N. The algorithm then locates or creates a 
sequence of nodes Ui and edges Cj, l<i<k, so that no, ei, ni, ..., e^, is a path in 
G, the labels of ei, ..., Ck-i match the sequence of dereferences and field accesses 
in N, and is a points-to edge. The points-to node of N is Uk- 

3.2 Overview of MoPPA 

MoPPA computes a global points-to graph Ggiot for a program and a procedural 
points-to graph Gp for each procedure P. Let g be a global pointer. If a memory 
location loc may be pointed to by £u>{g) at any point in the program, then a 
quasi-global name N identifying loc must be associated with the points-to node 
of £ui{g) in Ggiob- Let u be a local pointer declared in P. If a memory location loc 
may be pointed to by £ui{v) at any point in P under any calling context, then a 
name N identifying loc must be associated with the points-to node of £ui{v) in 
Gp. Given the points-to graphs, if a memory location loc is identified in P by an 
auxiliary parameter, then a program analysis can determine the calling contexts 
under which loc may be accessed by looking at the binding information at the 
callsite. However, if loc is identified in P by a quasi-global name, a program 
analysis must assume that loc may be accessed under each calling context. 

In addition to the points-to graphs, MoPPA also computes the set of quasi- 
global names whose scopes may include P according to Rules 1-3 in Section 2. 
MoPPA uses this information to determine the appropriate name for identifying 
a memory location in P. To compute this information, MoPPA first collects the 
global variable names that syntactically appear in P or in procedures directly 
or indirectly called by P. According to Rules 1 and 3, the scopes of these names 
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1 main() { 

(a) 2 char *p,*q; 

Program 2 3 alloc(&p); 

4 q=getg(); 

5 g=a; 

6 } 



7 char *getg() { 

8 char **t=&g; 

9 if(g==null) 

10 alloc(t); 

1 1 return *t; 
12 } 



13 alloc(char **f) { 

14 *f=malloc(4); 

15 } 

16 char *g,a[4]; 




Fig. 2. Program 2 and its points-to graphs. 



include P. MoPPA then searches, beginning at the nodes associated with the 
global variable names computed for P, for all reachable nodes in Ggiob- According 
to Rule 2, the scopes of the names associated with these nodes include P. 

MoPPA performs two major tasks in construction of the points-to graphs. 
The first task detects each pair of object names that may point to common mem- 
ory locations. MoPPA merges the points-to nodes of these two object names in 
a points-to graph to ensure that each common memory location pointed to by 
these two object names is represented by only one node. This merging operation 
is a variant of the “join” in Steensgaard’s algorithm m The second task deter- 
mines the memory locations represented by each node in the points-to graphs. 
MoPPA picks appropriate names to identify these memory locations at the node. 
MoPPA computes the points-to graphs in three phases (Figure 0 ). 

First Phase (Lines 1-8). In the first phase, MoPPA processes each pointer 
assignment Ihs = rhs in each procedure P to build GpI^If rhs is an object name, 
then MoPPA merges the points-to nodes of Ihs and rhs in Gp to capture the 
fact that Ihs and rhs point to the same memory location after this assignment 
(line 3). If rhs is an address-taking expression “&x”, then MoPPA adds variable 
name x to the points-to node of Ihs in Gp to indicate that Ihs points to x after 
the assignment (line 4). If rhs calls a memory allocation function, MoPPA sets 

® MoPPA represents the return value of each function with a variable and treats return 
statements as assignments (e.g., statement 11 in Figure0a) is treated as getg=*t). 
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a boolean flag HasHeap at the points-to node of Ihs in Gp (line 5). HasHeap of a 
node indicates that the node represents a heap-allocated memory location whose 
name has not yet been determined by MoPPA. In various phases of MoPPA, 
when two nodes A^i and N 2 are merged, if HasHeap of A^i or N 2 is set, then 
HasHeap of the resulting node is set. Figure 0(b) shows the points-to graphs 
constructed by MoPPA during this phase for Program 2 in Figure 0(a). Solid 
nodes in the graphs indicate that HasHeap of these nodes are set. 

Second Phase (Lines 9-26). In the second phase, MoPPA processes the call- 
sites in each procedure P to consider the effects, on Gp, of the procedures called 
by P. Let c be a callsite that calls Q in P. MoPPA first calls BindFromCallee() 
to search in Gq for object names £uii{p) and £ui 2 {<]) that point to the same node. 
If p and q are formal parameters bound to ai and 02 respectively at c, then af- 
ter c is executed, £oji{ai) and £(^ 2 ( 02 ) rnay point to the same memory location. 
Thus, BindFromCallee() merges the points-to nodes of £uii{ai) and £ui 2 {a 2 ) in 
Gp. If p is a formal parameter bound to a at c and g is a global pointer, then 
after c is executed, £uji{a) and £uj 2 {q) niay point to the same memory location. 
Thus, BindFromCallee() merges the points-to nodes of £uii{a) and £u> 2 {q) in Gp. 
For example, when MoPPA processes statement 4 in Figure|2Ka), it merges the 
points-to nodes of q and g in Gmainf) because getg and g point to the same node 
in GgetgO (return value getg is treated as a formal parameter in this phase). 

MoPPA also determines the memory locations whose addresses may be re- 
turned to P at c (lines 14-17). If Gq shows that a name x is associated with the 
points-to node of £oj{f), in which / is a formal parameter bound to a at c, then, 
after c is executed, £u){a) may point to x. MoPPA adds x to the points-to node 
of £o;(o) in Gp (line 15). If HasHeap of the points-to node of £ui{f) is set, then 
after c is executed, £ui{a) may point to a heap-allocated memory location. Thus, 
MoPPA sets HasHeap of the points-to node of £ui{a) (line 16). For example, when 
MoPPA processes statement 3 in Figure 0a), it sets HasHeap of p’s points-to 
node in G^ainO because HasHeap of *f’s points-to node is set in Ganoc()- 

In the second phase, MoPPA also constructs the global points-to graph Ggiot 
using information in Gp (lines 19-23). MoPPA first calls BindToGlobal() to 
search in Gp for object names £ui{gi) and £ 1 ^ 2 ( 92 ), where gi and 92 are global 
variables, that point to the same node. BindToGlobal () merges the points-to 
nodes of £cji{gi) and £^ 2 ( 92 ) in Ggiob. MoPPA then determines the memory 
locations that may be pointed to by object names extended from global pointers 
(lines 20-23). Let g be a global pointer. If Gp shows that HasHeap of the points- 
to node of £ui{g) is set, then MoPPA creates a new quasi-global name to identify 
the heap-allocated memory location associated with this node. MoPPA resets 
HasHeap and adds the new name to the points-to node of £ui{g) in Gp (line 
21). For example, when MoPPA processes getgO in Figure |3(a) in the second 
phase, it finds that HasHeap of the points-to node of g is set. Thus, the algorithm 
creates a name gh, resets HasHeap, and adds gh to the points-to node of g in 
CgetgO- MoPPA also propagates names associated with the points-to node of 
£uj{g) in Gp to the the points-to node of £ui{g) in Ggiob. 

In the second phase, MoPPA further computes GVars[P], the set of global 
variable names that appear syntactically in P or in P’s callee (line 26). In this 
phase, MoPPA processes the procedures in a reverse topological (bottom-up) 
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Algorithm MoPPA 

input P; the program to be analyzed 

output a set of points-to graphs 

declare GVars[P\\ global variable names collected for P 
function return the actual parameter bound to / at c 

globals(G): return the global variable names in G 
begin MoPPA 

1. foreach pointer assignment Ihs — rhs in each procedure P do 

2. case rhs do 

3. object name: merge points-to nodes of Ihs and rhs in Gp 

4. “&x” : add x to the points-to node of Ihs in Gp 

5. mallocO: set HasHeap of the points-to node of Ihs in Gp 

6. endcase 

7. endfor 

8 . add global variable names in each procedure P to GVars[P] 

9. add all procedures in P to worklists Wi and W 2 

10. while Wi^cf) do /* Wi: sorted in reversed topological order*/ 

11. remove P from the head of Wi 

12. foreach callsite c to Q in P do 

13. BindFromCallee (Gq , globals(GQ ),Gp,c) 

14. foreach points-to node N of £u>{f) in Gq where / is a formal parameter do 

15. copy names from N to £u>{A.c{f))'s points-to node in Gp 

16. if HasHeap of N is set then set HasHeap of Suj{>Ac{f))'s points-to node in Gp 

17. endfor 

18. endfor 

19. BindToGlobal(Gp,globals(Gp),Ggiob) 

20. foreach points-to node N of €uj{g) in Gp, g is global do 

21. if HasHeap of N is set then reset HasHeap of N and add a new name to N 

22. copy names from N to the points-to node of Suj{g) in Ggiob 

23. endfor 

24. if Gp is updated then add P’s callers to Wi 

25. endwhile 

26. compute GVars[P] for each procedure P using information from P’s callees 

27. foreach procedure P do 

28. compute the quasi-global names whose scopes include P 

29. while W 2 ^ (p do /* W 2 : sorted in topological order */ 

30. remove P from the head of W 2 

31. foreach callsite c to P in P' do 

32. BindFromCaller(Gp/ ,c,Gp) 

33. foreach name n at the node of Sio{a) in Gp/ and a is an actual bound to / at c do 

34. if n is quasi-global name whose scope includes P then 

35. add n to Suj{f)’s points-to node in Gp 

36. elseif no auxil parameter at Suj{f)’s, node in Gp and no auxil parameter to reuse then 

37. create an auxil parameter at Stj{f)'s points-to node in Gp 

38. endif 

39. endfor 

40. endfor 

41. BindFromGlobal ( Ggi ob, globals (Gp ),Gp) 

42. foreach name n at £i^{g)'s node in Ggiob where 5 is a global pointer appeared in Gp do 

43. add n to Suj{g)'’s points-to node in Gp 

44. foreach node N whose HasHeap is set in Gp do 

45. reset HasHeap 

46. if no auxiliary parameter associated with N then add a new local name to N 

47. endfor 

48. if Gp is updated then add P’s callees to W 2 

49. endwhile 

50. foreach callsite c in P do compute binding information at c 
end MoPPA 



Fig. 3 . MoPPA algorithm. 



order on the strongly-connected components of the call graph using a worklist. 
Figure 13(c) shows the points-to graphs for Program 2 after this phase. 

Third Phase (Lines 27 50). In the third phase, MoPPA processes each proce- 
dure P to assign appropriate names to identify the memory locations represented 
by each node in Gp. MoPPA completes this task in four steps. First, MoPPA 
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computes, by using Ggiob and GVars[P], the set of quasi-global names whose 
scopes include P (lines 27-28). 

Second, MoPPA processes each callsite c that calls P to capture the pointer 
information introduced by parameter bindings. Let P' be the procedure that 
contains c. MoPPA first calls BindFromCaller() to search for object names 
^wi(ai) and Sc^ 2 { 0 ' 2 ), in which ai and 02 are bound to fi and /2 respectively at c, 
that point to the same node in Gp> (line 32). MoPPA merges the points-to nodes 
of £o}i (/i) and £(^2 (/2) in Gp. MoPPA also determines the memory locations that 
may be pointed to by object names extended from formal parameters (lines 33- 
39). Let a be an actual parameter that is bound to formal parameter / at c, 
and n be a name identifying memory location loc in Gpi. If n is associated with 
the points-to node of £cj{a) in Gp>, then when P is invoked at c, £cj{f) may 
point to loc at P’s entry. If n is a quasi-global name whose scope includes P, 
then loc must be identified by n in P. Thus, MoPPA adds n to the points-to 
node of £ui{f) in Gp. Otherwise, n is not a quasi-global name or n is a quasi- 
global name but n’s scope does not include P. In this case, loc is identified in P 
with an auxiliary parameter. MoPPA checks to see whether there is an auxiliary 
parameter associated with the points-to node of £ui{f) in Gp. If no auxiliary 
parameter exists, then MoPPA further checks the /c-limiting restriction using 
the approach described in Subsection EH If MoPPA cannot reuse an existing 
auxiliary parameter, it creates a new auxiliary parameter and adds this auxiliary 
parameter to this node. For example, when MoPPA processes statement 10 in 
Figure EK a), it finds that actual parameter t may point to g. Because the scope 
of the quasi-global name for g does not include alloc (), MoPPA introduces 
auxiliary parameter nvl to identify this memory location and adds nvl to the 
points-to node of f in Gaiioc()- Note that in the third phase, if two nodes A^i and 
N 2 are merged, at most one auxiliary parameter is kept in the resulting node. 

Third, MoPPA further determines, by examining Ggiob, the memory locations 
that may be represented by nodes in Gp (lines 41-43). Let gi and (72 be global 
variable names that appear in Gp (i.e., 51,(72 G globals(Gp)). MoPPA calls 
BindFromGlobal() to search, in Ggiob, for object names £uii{gi) and £u,i{gi) 
that point to the same node. BindFromGlobal() merges the points-to nodes of 
£^1(51) and £iji{gi) in Gp. Let 5 be a global variable name that appears in Gp. 
If Ggiob shows that name n is associated with the points-to node of £ui{g), then 
MoPPA adds n to the points-to node of £<^(5) in Gp. 

Fourth, MoPPA assigns names for the unnamed heap-allocated memory lo- 
cations represented by nodes in Gp (lines 44-47). MoPPA examines, in Gp, 
each node N whose HasHeap is set. If an auxiliary parameter aux is associ- 
ated with N , then N is pointed to by an object name extended from formal 
parameters. Therefore, the heap-allocated memory locations associated with N 
may be returned to P’s callers. MoPPA reuses aux to identify these memory 
locations. However, if no auxiliary parameter is associated with N , then these 
heap-allocated memory locations are not returned to P’s callers. MoPPA cre- 
ates a new local name to identify these memory locations and adds this name 
to N . In both cases, MoPPA resets HasHeap of N . For example, MoPPA discov- 
ers that, in Ganoc()j the points-to graph for alloc () in Figure 0(a), HasHeap 
of the points-to node of *f is set and an auxiliary parameter nv2 is associated 
with this node. Therefore, it reuses nv2 to identify the heap-allocated memory 
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(a) Program 3 p(Ust *tl,int In) { 



Struct L *next; 
} List; 



(b) k-limitting points-to graph (k=l) 

Parameter binding at cl 



Parameter binding at c2 



GO(List *t,int In) { 



if(ln<0) return; 
c2: P(t,ln-1); 




Gpo: : Good: 



nvl 

next 



Fig. 4. Example program 3 (a) and its k- limiting points-to graphs (b). 



locations represented by this node. In another case, MoPPA discovers that, in 
GmainOi HasHeap of the points-to node of p is set but no auxiliary parameter is 
associated with this node. Therefore, it creates a local name Ih to identify the 
heap-allocated memory locations represented by this node (Figure EJd)). 

In the third phase, MoPPA processes the procedures in a topological (top- 
down) order on the strongly-connected components of the call graph using a 
worklist. After all the points-to graphs stabilize, MoPPA processes each callsite 
c to compute the binding information between the names in the procedure con- 
taining c and the auxiliary parameters in the called procedure (line 50). This 
step can be done on-demand when the pointer information is used. 

Figure |21d) shows the points-to graphs that MoPPA computes for Program 
2. Compared to the points-to graphs (Figure |2Ke)) constructed by FIGS for this 
program, MoPPA computes more compact and more precise pointer information. 

3.3 Complexity of MoPPA 

Let p be the number of procedures in a program V, c he the number of callsites in 
V, and S be the worst-case actual size of a procedural points-to graph. Without 
considering the cost of line 8 and lines 26-28, the time complexity of MoPPA is 
the same as that of FIGS, which is 0{N*S*a{N*S,p*S)) [11 5] . given that a is the 
inverse Ackermann function, N is (c+p) in the absence of recursion and (c+p)*S 
in the presence of recursion. The steps taken at lines 8 and 26 are similar to those 
taken in the modification side-effect analysis. Thus, the time required by these 
two lines is O(n^). Line 28 can be done by first mapping the names in GVars[P] 
to the nodes in Ggiob, and then searching in Ggiob beginning at these nodes. 
Thus, the time required by line 28 is 0(n + Sgiob), where Sgiob is the size Ggiob- 
The time complexity of MoPPA is 0{p* {n + Sgiob) + n^ + N * S * a{N *S,p*S)). 

3.4 Handling Recursive Data Structures and Indirect Calls 

MoPPA uses a variant of k-limiting to handle recursive data structures. The 
variant limits the number of consecutive suspicious nodes — nodes associated 
with only auxiliary parameters — on a simple patlfl to k (field nodes are not 
counted). The restriction is checked only when MoPPA processes a recursive 

^ A simple path does not contain two identical nodes. 
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call. For example, when MoPPA binds nv2 to G0() at callsite cl in P() (Figure 
Efa)), it attempts to add a new auxiliary parameter in Gggq, which would create 
a simple path with two consecutive suspicious nodes. Thus, when fc is 1, MoPPA 
reuses nvl and binds nv2 to nvl. Figure EJb) shows the resulting points-to 
graphs. Note that because nv2 is supposed to be bound to the memory locations 
pointed to by nvl .next, in the graph, MoPPA creates a new edge from the node 
representing nvl .next to the node representing nvl. 

MoPPA can use one of the following two solutions to handle programs that 
contain indirect calls through function pointers. The first solution uses the call 
graph computed by another algorithm, such as Steensgaard’s algorithm. The 
second solution begins the analysis with a partial call graph, and computes the 
complete call graph during the analysis. This approach requires iterations be- 
tween the bottom-up phase and the top-down phase and thus, increases 
the complexity of MoPPA. To use the second approach, MoPPA keeps an extra 
shadow points-to graph Gp for each procedure P to separate the summary in- 
formation about P from the pointer information computed for In the first 
phase, MoPPA puts the pointer information into both Gp and Gp. In the second 
phase, MoPPA uses Gp to update both Gq and Gq if P is called by Q, and uses 
Gp to update Ggiob- In the third phase, MoPPA uses only the normal points-to 
graphs for the procedures. At the end of the third phase, MoPPA first examines 
each indirect call. If MoPPA discovers new callees, it expands the call graph and 
repeats the second and third phases starting only from the affected procedures. 
Otherwise, the algorithm computes the binding information and terminates. 



4 Empirical Studies 

We implemented a prototype of MoPPA using the PROLANGS Analysis Frame- 
work (PAF) PUj. Our prototype resolves function pointers using Steensgaard’s 
algorithm. The prototype is parameterized to treat a structure as either an 
atomic memory location or a collection of fields. In the latter case, the proto- 
type does not account for accesses that require knowledge of the physical layout 
of a structure. This limitation may affect the safety of the pointer information. 
However, it should not significantly affect the validity of our studies because 
(1) this kind of access is rare in programs, and (2) all the algorithms that we 
compare are implemented in the same way. More sophisticated techniques will 
be used to handle such accesses in our future work. 

We have performed several empirical studies to evaluate the performance of 
MoPPA and the effectiveness of using the parameterized pointer information 
provided by MoPPA in program analyses. We compared MoPPA with FIGS and 
Andersen’s algorithm. Other studies (e.g., TOEEI) show that pointer informa- 
tion computed by these two algorithms is very close in precision to many other 
existing algorithms, including flow-sensitive algorithms. In addition, MoPPA is 
implemented using the same framework as FIGS. Thus, comparison between 

® Gp may be eliminated if MoPPA can determine the procedures that are directly or 
indirectly called by P. 
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program 


Subject Size 


Time(s) 


LOC 


Nodes 


Proes 


Tm 


Tf 


dixie 


2100 


1357 


52 


0.30 


0.19 


assem 


2510 


1993 


58 


0.67 


0.44 


smail 


3212 


2430 


59 


0.43 


0.30 


lharc 


3235 


2539 


89 


0.51 


0.25 


simulate 


3558 


2992 


114 


0.50 


0.27 


flex 


6902 


3762 


93 


0.89 


0.32 


rolo 


4748 


3874 


142 


0.90 


0.54 


Space 


11474 


5601 


137 


1.75 


1.36 


spim 


24322 


11352 


263 


3.43 


2.49 


mpgplay 


17263 


11864 


135 


3.68 


2.36 


espresso 


12864 


15351 


306 


6.01 


4.61 


moria 


25002 


20316 


482 


7.70 


3.34 


twmc 


23922 


22167 


247 


3.92 


4.24 


nethack 


32119 


31703 


701 


48.2 


132 


povraySj 


101033 


47254 


1216 


24.1 


8.52 



program 


# of heap 


# of dependences 


Mo 


FI 


Mo 


FI 


Reduc 


dixie 


10 


7 


5.10 


6.19 


17.7% 


assem 


17 


15 


2.68 


3.62 


25.8% 


smail 


12 


7 


2.60 


3.20 


19.0% 


lharc 


3 


3 


2.47 


2.52 


2.0% 


simulate 


3 


3 


2.49 


2.58 


3.4% 


flex 


39 


7 


5.85 


6.57 


11.0% 


rolo 


27 


10 


3.96 


4.41 


10.3% 


space 


11 


11 


2.83 


3.03 


6.5% 


spim 


131 


17 


26.6t 


42. 8t 


37.9% 


mpgplay 


64 


58 


6.94 


6.97 


0.4% 


espresso 


238 


111 


4.10 


4.31 


4.8% 


moria 


1 


1 


9.17 


11.66 


21.3% 


twmc 


144 


113 


4.26 


4.51 


5.6% 


nethack 


17 


2 


5.00 


5.40 


7.3% 


povray3 


147 


81 


10.01 


12.12 


17.4% 



Table 1. Left: Sizes of the subject 
programs. Right: Time in seconds for 
MoPPA (Tm) and PICS (Tp)- 



Table 2. Left: Number of distinguish- 
able names for heap-allocated loca- 
tions. Right: Average number of flow 
dependences for a statement. 



t Structures in povrayS are treated as atomtic memory locations, 
j spim contains two large procedures with over 1000 nodes. Without considering 
these two procedures, the result for MoPPA is 2.83, and the result for FIGS is 2.95. 



MoPPA and FIGS can reveal the extra cost required to perform the sophisti- 
cated naming scheme used to obtain parameterized pointer information. 

We collected the data for the studies on a Sun Ultra 30 workstation with 
640MB of physical memory. Structures in programs other than povrayS are 
treated as collections of fields. However, structures in povrayS are treated as 
atomic memory locations because our system exhausts available memory other- 
wiseB To capture the pointer information introduced by calls to library functions, 
the prototype used a set of stubs to simulate these functions Q 

The left side of Table [U shows our subject programs. For each program, 
column LOC shows the lines of code (comments included), column Nodes shows 
the number of control flow graph nodes, and column Procs shows the number of 
procedures. These subjects have also been used in other studies jlillM. 



4.1 Study 1 

The goal of this study is to evaluate the performance of MoPPA. To investigate 
the time efficiency of MoPPA, we compared the time to run MoPPA and the 
time to run FIGS on each subject. The right side (Tm,Tp) of Table Q] shows 

® Other studies (e.g., EE0) that report results on large programs also treat structures 
as atomic memory locations. 

^ Similar stubs have been used in other studies (e.g. |14ll5| i. 
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the comparison. The time shown in the table excludes time to parse and to 
resolve function pointers for each subject. The table shows that, for our subjects, 
although MoPPA can be three times slower than FIGS, it is still very efficient. 
This result suggests that MoPPA will scale to large programs as well as FIGS. 
Note that MoPPA is faster than FIGS on twmc and nethack because MoPPA 
propagates less information from procedure to procedure. 

In the study, we also investigated the effectiveness of MoPPA in distinguish- 
ing memory locations allocated on the heap by a procedure under different calling 
contexts. The left side of Table |2] compares the number of distinguishable names 
for heap-allocated memory locations used by MoPPA(Mo) or by FIGS(F’/). 
Two names are distinguishable in a program if, according to the pointer infor- 
mation, the memory locations identified by the names are accessed at different 
sets of statements. A program analysis may compute more precise information 
when it uses pointer information consisting of more distinguishable names for 
heap-allocated memory locations. In FIGS, we considered the artificial names 
that represent heap-allocated memory locations. In MoPPA, we considers the 
quasi-global names and local names that are created for heap-allocated memory 
locations. The table shows that, for several programs (e.g. spim), MoPPA identi- 
fies many more distinguishable names for heap-allocated memory locations than 
FIGS. Thus, a program analysis may compute more precise information when 
using pointer information provided by MoPPA. 

4.2 Study 2 

The goal of this study is to evaluate the impact of using pointer information 
provided by MoPPA and FIGS on the computation of flow dependence — one va- 
riety of data dependence — within a procedure. A statement si is flow- dependent 
on a statement S 2 if si may use the value set by S 2 - Flow dependence can be 
used in important tasks such as program optimization and data-flow testing. 

In this study, we computed the average number of statements on which a 
statement is flow-dependent. For each callsite, we used its side-effects to compute 
flow dependences. The right side of table shows the results of this study when 
the pointer information is provided by MoPPA (Mo) or FIGS {FI), and the per- 
centage of spurious flow dependences {Reduc) that can be eliminated by using 
information provided by MoPPA. The table shows that, for several programs 
(e.g., moria), using pointer information provided by MoPPA can significantly 
(> 10%) reduce the spurious flow dependences, and thus, may significantly im- 
prove the precision of program analyses that require data-flow information. Note 
that, for other programs (e.g., lharc) on which the reduction is less significant, 
using pointer information provided by MoPPA may still improve the precision of 
program analyses because the spurious information propagated across procedure 
boundaries may be reduced. 

4.3 Study 3 

The goal of this study is to evaluate the impact of using pointer information pro- 
vided by MoPPA or FIGS on the precision and the efficiency of program analyses 
that require transitive interprocedural flow dependence. The study consists of 
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Table 3. Left: Average size of a data 
slice, Right: Average time in seconds 
to compute a data slice. 
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Table 4. Left: Average size of a pro- 
gram slice, Right: Average time in sec- 
onds to compute a program slice. 



f Data are collected on one slice, 
t Ik = 1,000. 

* Data are unavailable: the system does not terminate within the time limit (10 hrs.) 
or runs out of memory. Data for povrayS are unavailable for the same reason. 



two parts. The first part considers the impact on the computation of transitive in- 
terprocedural flow dependence. We measured the average number of statements 
that can transitively affect a specific statement s through flow dependence. For 
convenience, we refer to this set of statements as the data slice with respect to 
s. We also measured the average time to compute a data slice. These measure- 
ments serve as an indicator of the impact of using such pointer information on 
program analyses that require transitive interprocedural flow dependence. 

Table 0 shows these two measurements when the pointer information is pro- 
vided by MoPPA (Mo) and FIGS {FI). The table also shows the reduction in the 
size of a data slice (Reduc) when the pointer information is provided by MoPPA. 
We obtained the data by running a modified version of our reuse-driven sheer 
m- The table shows that, for many programs we studied (e.g., small), using 
pointer information provided by MoPPA can significantly improve the precision 
and the efficiency of the computation of transitive flow dependence. 

The second part of the study considers the impact of using pointer informa- 
tion provided by MoPPA and FIGS on program slicing m, a program analysis 
that requires transitive flow dependences. We measured the average size of a pro- 
gram slice and the average time to compute such a slice. We obtained the data by 
running our reuse-driven program sheer on each subject. Table 0 shows the re- 
sults when the pointer information is provided by MoPPA (Mo) and FIGS {FI). 
The table also shows the reduction in the size of a program slice {Reduc) when 
the pointer information is provided by MoPPA. The results indicate that, for 



Efficient Computation of Parameterized Pointer Information 



295 



program 


(I) Dependence 


(11) Data slice) 


(111) Program slice) 


Size 


Size 


Time(s) 


Size 


Time(s) 


Mo 


And 


Reduc 


Mo 


And 


Reduc 


Mo 


And 


Mo 


And 


Reduc 


Mo 


And 


space 


2.83 


3.03 


6.5% 


207 


296 


30.1% 


0.6 


78.8 


2249 


2504 


10.2% 


9.5 


620 


spim 


26.6 


38.5 


31% 


2107 


2371 


11.1% 


743 


1849 


3434 


3704 


7.3% 


3459 


5094 


mpgplay 


6.94 


6.96 


0.2% 


2054 


2170 


5.3% 


19.4 


23.6 


3950 


4120 


4.1% 


91.3 


115 


espresso 


4.10 


4.50 


8.9% 


2888 


3374 


14.4% 


35.9 


IGkX 


5125 


5732 


10.6% 


187 


2Qk 


moria 


9.17 


11.6 


21% 


3146 


~k 


- 


621 


~k 


~k 


~k 


- 




~k 


twmc 


4.26 


4.49 


5.3% 


1152 


2385 


51.7% 


11.4 


901 


12884 


12893 


0.1% 


851 


20A: 


netback 


5.0 


5.33 


6.1% 


2628 


★ 


- 


1146 


★ 


★ 


~k 


- 


★ 


★ 



Table 5. Comparing MoPPA (Mo) and Andersen’s algorithm (And): (I) Aver- 
age number of flow dependences; (II) Size of a data slice and time in seconds to 
compute such slice; (III) Size of a program slice and time in seconds to compute 
such slice. 



|Data are reported for one slice. 
tlk = 1,000. 

* Data are unavailable: the system does not terminate within the time limit (10 
hours) or runs out of memory. Data for povrayS are unavailable for the same reason. 



many our subjects (e.g., small), using pointer information provided by MoPPA 
can significantly improve the precision and efficiency of program slicing. 

By considering the results of both parts of the study, we can conclude that 
using parameterized pointer information provided by MoPPA may significantly 
improve the precision and efficiency of many program analyses. 

4.4 Study 4 

The goal of this study is to compare MoPPA with Andersen’s algorithm Q for 
supporting program analyses. We repeated Studies 2 and 3 on these two algo- 
rithms. Table 0 shows the results. Due to space limitations, we show only pro- 
grams over 10,000 lines of code. The results closely resemble those presented in 
Studies 2 and 3. For example, for space, when information provided by MoPPA 
is used, Table Elshows that a sheer runs 65 times faster and computes 10% smaller 
slices than using that provided by Andersen’s algorithm, whereas Table 0 shows 
that a sheer runs 56 times faster and computes 10% smaller slices than using 
that provided by FIGS. This similarity is not surprising because other studies 
show that the information computed by FIGS is almost the same as that 
computed by Andersen’s algorithm. The results strengthen our conclusion that 
using parameterized pointer information provided by MoPPA may significantly 
improve the precision and efficiency of program analyses. 



5 Related Work 

Beginning with FIGS several flow-insensitive, context-sensitive algorithms 
^I9j that compute one solution for each procedure have been developed. MoPPA 
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extends one of these algorithms by parameterizing the pointer information for 
each procedure. Similar approaches can be used to extend other algorithms. 

Emami et al.’s algorithm 0 and Wilson and Lam’s algorithm m also com- 
pute parameterized pointer information. Emami et al.’s algorithm analyzes a pro- 
cedure under each specific calling context. The algorithm uses symbolic names 
to represent local variables that are indirectly accessed but not visible in current 
procedure. The symbolic names let the algorithm reuse the pointer information 
computed for a procedure under several calling contexts if the alias configura- 
tion for inputs are the same under these calling contexts. Wilson and Lam’s 
algorithm further develops this idea. To increase the opportunity for reuse, the 
algorithm uses extended parameters to represent global variables and memory 
locations that can be accessed through dereferences of formal parameters and 
global pointers in a procedure. In both algorithms, when the alias configuration 
for inputs of a procedure changes, the procedure must be reanalyzed. MoPPA 
differs from these two algorithms in that MoPPA analyzes a procedure indepen- 
dently of its calling contexts. Thus, the summary information computed for a 
procedure can be used for all its calling contexts. The maximum reuse, its flow- 
insensitivity, and the separation of global information from local information 
contribute to the efficiency and the scalability of MoPPA. 

Other pointer analysis algorithms (e.g., pmi) also use symbolic names to 
represent memory locations whose addresses are passed into a procedure from 
calling contexts. However, in these algorithms, because the symbolic names are 
created before the pointer information at the callsites is computed, two symbolic 
names may represent the same memory location under a calling context. In the 
final pointer solution, the symbolic names must be replaced with concrete values. 
Therefore, these algorithms do not provide parameterized pointer information. 

Several other existing pointer analysis algorithms use a modular approach for 
computing pointer information. One such algorithm is Chatterjee, Ryder, and 
Landi’s Relevant Context Inference (RCI) 0. MoPPA and RCI differ in that, (1) 
RCI computes non-parameterized pointer information, and (2) RCI computes 
pointer information using a flow-sensitive approach, and thus, may not scale 
to large programs. Another modular pointer analysis algorithm is Cheng and 
Hwu’s flow-sensitive algorithm Pj, which uses access path^ to identify a mem- 
ory location. One way that MoPPA differs from Cheng and Hwu’s algorithm 
is efficiency. Cheng and Hwu’s algorithm must propagate global pointer infor- 
mation from procedure to procedure, and must iterate over pointer assignments 
and points-to relations using an approach similar to Andersen’ algorithm Q . In 
contrast, MoPPA reduces the information propagated across procedure bound- 
aries by capturing the global pointer information in a global graph. MoPPA 
also avoids iteration over pointer assignments and points-to relations by merg- 
ing points-to nodes using an approach similar to Steensgaard’s algorithm H2|. 
Another way that MoPPA differs from Cheng and Hwu’s algorithm is its support 
for interprocedural program analyses. Access paths used in Cheng and Hwu’s 
algorithm can also identify different memory locations in a procedure under dif- 
ferent calling contexts. However, because one memory location may be identified 
by several access paths, a program analysis using this pointer information may 



An access path is similar to an object name defined in this paper. 



Efficient Computation of Parameterized Pointer Information 



297 



propagate more information across procedure boundaries than using the infor- 
mation provided by MoPPA. In addition, mapping an access path from a called 
procedure to a calling procedure is more expensive than mapping an auxiliary 
parameter. 

Many flow-insensitive algorithms can be described as building points-to 
graphs or as generating and solving a set of constraints. Both approaches have 
advantages and disadvantages. Foster, Fahndrich, and Aiken proposed a poly- 
morphic flow-insensitive points-to analysis framework that computes pointer in- 
formation by solving constraints [2|. When the framework uses term constraints, 
the resulting algorithm is an variant of FIGS. The framework differs from MoPPA 
in that it computes non-parameterized pointer information. Studies show that 
their current implementation of the framework may not scale to large programs 
0 - 

Some existing pointer analysis algorithms provide conditional pointer 

information, in which a points-to relation may be associated with a condition 
that specifies the calling contexts under which this relation may hold. Although 
such conditions may help a program analysis reduce the amount of spurious in- 
formation propagated across procedure boundaries H2!, adding conditions to the 
points-to relations may increase the complexity of the pointer analysis. Studies 
show that these algorithms may not scale to large programs pinj. 

6 Conclusion 

This paper presents MoPPA, a modular algorithm that computes parameter- 
ized pointer information for C. The paper also presents studies that compare 
MoPPA with FIGS and Andersen’s algorithm. The studies show that MoPPA is 
efficient and that using parameterized pointer information provided by MoPPA 
can significantly improve the precision and efficiency of program analyses. 

Due to space limitations, this paper does not present the details of handling 
memory accesses using constucts such as casting that require knowledge of the 
physical layout of a structure. Several existing approaches (e.g., |22) can be 
incorporated into MoPPA to handle such accesses. Our future work will include 
investigation of the impact of different approaches on MoPPA. Our future work 
will also include additional empirical studies on larger programs. 
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Abstract. For controllable timed automata, a general parametric opti- 
mization framework based on automata-theory is proposed. The frame- 
work is general enough to incorporate both the parametric analysis prob- 
lem and the controller synthesis problem of computer systems. We pro- 
pose an algorithm for the construction of the characterization of the 
parameter constraints and controller synthesis, which in turn yields a 
linear programming solution to parametric optimization. 



1 Introduction 

As increasing efforts have been devoted to applying CAV techniques to real- 
world systems, it becomes urgent to design appropriate models and analytical 
techniques to deal with parametric optimization of real-time systems. Many (if 
not most) of the conventional CAV techniques are only capable of classifying 
a given system as ’good’ or ’bad.’ To make things even more complicated, the 
behavior of many of the real-world systems is often influenced by various en- 
gineering constraints, e.g. assumptions on environments in which the systems 
reside. Suppose the performance of a system is evaluated once with respect to 
a given constraint setting. Once the constraint changes, traditionally either the 
evaluation process is restarted, or the performance under the new constraint 
is calculated using the so-called extrapolation technique utilizing results from 
known system constraints. The former is somewhat time-consuming, whereas 
the latter suffers from imprecision. In addition, neither technique is appropriate 
for answering a question like: find the environment assumptions under which the 
system performs best. It is therefore highly desirable to employ an evaluation 
strategy that is parametric in nature. That is, the variations of the engineering 
constraints are treated as parameters, and the evaluation ends up including such 
parameters as part of the performance measure. 

We demonstrate in this paper a general framework within which parametric 
optimization is carried out in a parametric fashion for real-time systems modeled 
by controllable timed automata. As we shall see later, our framework incorpo- 
rates both parametric analysis and controller synthesis, which are two issues that 

* The work is partially supported by National Science Council, Taiwan, ROC under 
grant NSC 89-2213-E-001-046. 
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Fig. 1. A Simple Controllable Timed Automaton. 



have received increasing attention in the CAV community lately [3-7, 11, 13, 16- 
18]. Aside from being more general than the problem of parametric analysis or 
controller synthesis alone, a unique feature of our solution to parametric opti- 
mization lies in that we are able to construct a characterization of the parameter 
constraints and controller synthesis, which in turn yields a linear programming 
solution to parametric optimization. Such a characterization is valuable both in 
the construction of a solution controller and in the derivation of the optimal 
performance. By doing so, re-evaluation of the performance of a system under a 
new constraint setting is as easy as solving the parameterized inequalities with 
respect to a new set of parameters. More interestingly, by encapsulating the envi- 
ronment’s parameters into our framework of performance evaluation, it becomes 
feasible to find out the best system performance by solving an optimization 
version of the parametric inequalities. 

Example 1. A Simple Automaton. To give the reader a better feel for the 
issue of parametric optimization of controllable timed automata, consider Figure 
1 in which a simple controllable timed automaton is shown. Two operation modes 
are represented by the ovals in which we have mode names qo and qi, and 
invariance conditions a; < 1 and true, respectively. Between the modes, there is 
an arc for a transition labeled with a triggering condition (above). The triggering 
condition contains a special symbol a which represents the enabling signal from 
the controller. In this paper, we shall adopt the approach in [6, 7, 13] which 
assumes that a is an uninterpreted Boolean function of regions [1]. Notice that 
the automaton is parametric in the sense that “cost” is a static parameter for 
some optional functions of the automaton. 

An example specification which we may want to analyze is written in PCTL 
{Parametric CTL, defined in Section 2) as follows. 

(j) = (cost < 100) A (0 > 1) A (Vn<g(7o) 

The formula says that the cost must be less than 100 dollars, parameter 9 is 
greater than or equal to 1, and we want to find out the characterization of 9 
such that go is true in all computations within 9 time units from the initial 
state. As the transition from go to gi is ‘controlled’ by the environment through 
the control symbol a, whether (j) holds depends not only on the values of the 
static parameters but also on the control policy imposed on the automaton by 
the controller (i.e., environment). 
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Suppose we are given an optimization metric like ‘cost — 0.’ For the afore- 
mentioned ‘parametric’ timed automaton, specification, and optimization metric, 
parametric optimization in our setting boils down to finding a control policy un- 
der which the optimization metric is maximized for a parameterized automaton 
meeting the specification. 

Our framework is capable of analyzing trace-oriented optimization problems 
of real-time systems. For example, with the same automaton used in figure 1, 
we may have another specification (j)' = cost < 12 A Vn((7o — *■ VO<g( 7 i), which 
says that q\ will eventually be true in time < 6 whenver is true, and an 
optimization metric —0. Then the parametric optimization problem asks for the 
deadline value in all computations, from a go state to a q\ state, subject to the 
restriction of (j)' . Thus it is easy to see that our framework is more general than 
previous ones like [10]. 

In example 4 of subsection 3.5 after we have presented our algorithms, we 
shall present the answers to the parametric optimization problems with these 
two just-mentioned sets of specifications and optimization metrics. || 

Our parametric optimization problem can be thought of as a generalization 
of both parametric analysis ( “what parameter settings make a system correct?” ) 
[3,4,11,16-18] and controller synthesis (“what controller if any induces a cor- 
rect behavior?”) [5-7, 13] of timed automata. For parametric analysis, parameter 
variables are associated with either temporal operators or timed automata, and 
problems such as ‘Does there exist a valuation of the parameters so that the 
system meets a given property?’ or ‘Is it the case that for all valuations of the 
parameters the given property always holds for the system?’ are being thought. 
It has been shown that for parametric timed automata in which parameter vari- 
ables are allowed to be compared with clocks, the verification problem is in 
general undecidable [4]. If, however, parameter variables are in existence in the 
specification (i.e., temporal formulas) only, then the problem becomes decidable 
[4] . In the context of parametric analysis, Alur et al. [3] considered the so-called 
‘model measuring’ problem for parametric linear time temporal logic (PLTL). 
Model measuring is an extension of the standard model checking problem in 
that the latter only returns ‘yes/no’ answers, whereas the former provides an- 
swers to a number of questions regarding the set of parameter valuations for 
which the given specification is fulfilled. Emerson and Trefier [11], on the other 
hand, investigated the model checking problem for parameterized real-time com- 
putation tree logic (PRTCTL). In [3], PLTL is defined over conventional Kripke 
structures, whereas in [11], both untimed and (discrete) timed structures are 
considered for PRTCTL. With respect to dense-time automata, Wang [16] gave 
a complete characterization of the set of parameter valuations satisfying a spec- 
ification expressed in parametric computation tree logic (PCTL) in terms of a 
set of linear inequalities. The work of [16] has subsequently been generalized 
in [17], which shows that parametric analysis remains decidable for the model 
in which timed automata are augmented by static parameters (i.e., nontiming 
parameters) and temporal formulas are parameterized by both timing and non- 
timing parameters. In comparison with its predecessors in parametric analysis of 
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timed automata, Wang’s algorithm in [17], based upon the technique of dynamic 
programming, is easier to understand, implement, and analyze. 

What makes controller synthesis an important issue is that many interesting 
real-world systems tend to be open in nature, meaning that their behaviors are 
influenced by the environment. Since the seminal work of Ramadge and Wonham 
[14], the use of automata and formal languages to reason about controllability 
of discrete event dynamic systems has received much attention in the control 
community in the past decade. Being recognized as one of the most popular 
models for representing real-time systems, timed automata [2] have naturally 
become the underlying model for which various controller synthesis issues are 
investigated, aside from a very successful role such a model has played in the 
verification aspects of real-time systems. Consider a dynamic system (modeled 
by a timed automaton) whose behaviors are to be controlled in a certain way 
so as to meet certain predefined requirements. The controller synthesis problem, 
simply speaking, is to And out whether, for a given system, there is a controller 
through which the interaction between the system and the controller results 
in only computations of ‘good’ behavior. (If such a controller exists, it is also 
desirable to construct it effectively.) The interested reader is referred to [6] for a 
symbolic approach for controller synthesis. As opposed to providing only yes/no 
answers in the conventional framework of controller synthesis, a recent article [5] 
dealt with quantitative properties of behaviors for controllable timed automata. 

In this paper, we move a step further from previous work [3-7, 11, 13, 16-18] 
by considering the controller synthesis issue for parametric timed automata with 
respect to system requirements specified by parametric computation-tree logic 
(PCTL) (see [16,17]). By explicitly allowing static parameters in our model, 
a richer parametric optimization framework, in comparison with that of [5], is 
provided. To the best of our knowledge, our work is the first that addresses 
parametric approach for the optimization of synthesized controller. Unlike the 
(fixed-point based) backward reachability approach employed in [13] for the con- 
troller synthesis problem of timed automata, we generalize the parametric anal- 
ysis technique devised in [17] to derive, for a given parametric timed automaton 
A and a PCTL formula (j), a complete characterization of the parameter con- 
straint and controller synthesis which is satisflable if and only if there exists a 
controller forcing A to meet property (j). The characterization contains the infor- 
mation for both controller synthesis and parameter constraint, which enables us 
to formulate a unified framework in integer linear programming for parametric 
optimization of real-time systems. We feel that our approach is interesting in 
its own right and may have applications to the analysis of related problems for 
real-time systems. 

The remainder of this paper is organized as follows. Section 2 introduces 
the model of statically parametric plants, parametric computation tree logic, as 
well as the parametric optimization problem. An algorithm, together with an 
illustrating example, is demonstrated in Section 3 for solving the parametric 
optimization problem. Section 4 concludes our work. 
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2 Parametric Optimization Problem 

A parametric optimization problem instance in our framework is given as a tuple 
(A, (/), A) such that A is a controllable timed automaton (statically parametric 
plant, defined in subsection 2 A) for the description of the system behavior, (j) 
is a temporal logic formula for the requirements on the system behaviors, and 
A is a linear expression of parameters for the performance measurement. The 
aim of the problem is to find a valuation (interpretation) of the parameters that 
maximizes A and makes A satisfy (j) under the interpretation for some control 
strategy. A framework for minimization of A can be similarly defined by changing 
the signs of coefficients in A. 

2.1 Statically Parametric Plant (SPP) 

An SPP is a timed automaton extended with linear constraints of static param- 
eters and transition controls. In an SPP, people may combine control signals, 
timing inequalities on clock readings, and linear inequalities of static parameters 
to write the invariance and transition conditions. Such a combination is called 
a state predicate and is defined formally in the following. Given a set P of ba- 
sic propositions, a set X of clocks, and a set H of parameter variables, a state 
predicate rj of P, A, and PI has the following syntax rules. 

rj ::= false \ p \ x-y ^ c \ x ^ c \ Y. ~ c | ?7i V 772 | 

where p G P, x,y G X, ai,c G Af, ai G H, ^ G {<, <, =, >, >}, and ? 7 i, 772 are 
state predicates. Notationally, we let B{P, X, H) be the set of all state predicates 
on P, A, and H. Note the parameter variables considered in H are static because 
their values do not change with time in computation of an automaton. A state 
predicate with only Y ~ c type literals is called static. 

Definition 1. (Statically Parametric Plant): A statically parametric plant 
(SPP) is a tuple {Q, qo, A, H, y, E, r, tt) with the following restrictions. 

• Q is a finite set of modes (operation modes, or control locations). 

• 9 o G Q is the initial mode. 

• A is a set of clocks with readings in P+, i.e., the set of nonnegative reals. 

• is a set of parameter variables with values in Af, i.e., the set of nonnegative 
integers. 

• 77 is a mapping from Q such that for each q G Q, y{q) G P( 0 , A, E) is the 
invariance condition true in q. 

• E C Q X Q is the set of transitions. 

• T : E P({cr}, X, ff) is a mapping which defines the transition-triggering 
conditions. Here <t is a control signal symbol representing the enabling/dis- 
abling signal from the controller. Conceptually, a is an uninterpreted Boolean 
function of states whose presence on edge e suggests such an edge to be 
‘controllable’. 

• 7 T : E 2 ^ defines the set of clocks to be reset during each transition. || 
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Figure 1 displays a simple example of an SPP in which Q = {qo, qi}, X = {x}, 
H = {cost}, and a is the control signal symbol associated with the only transition 
of the plant. Notice that in this example, the invariance conditions (defined by /i) 
associated with q^ and q\ are x < 1 and true, respectively, although in general, 
parameter variables are allowed to take part in the invariance conditions. An 
SPP starts its execution at its initial mode q^. We shall assume that initially, all 
clocks read zero. In between mode transitions, all clocks increment their readings 
at a uniform rate. A transition of an SPP may be fired when the triggering 
condition of the transition is satisfied. During a transition from modes q^ to q\, 
for each x G 7 t((7o, 9i), the reading of x is reset. It is worthy of pointing out that 
in our setting, it is legal to let time elapse even in the presence of an enabled 
transition. (The reader is encouraged to contrast our model with that of [5] in 
which transition firings are assumed to be ‘urgent.’) For instance, in state q^ with 
X = 0.5 for the SPP depicted in Figure 1, the computation may either stay in q^ 
while letting the clock run, or exercise the transition from q^ to q\, provided that 
cost > 10 and the controller assigns true to a. The behavior of an SPP depends 
not only on the interpretation of the parameter variables, but also on the control 
policy enforced by the environment during the course of its computation. The 
interested reader should notice that although the triggering condition hinders 
on the control signal symbol a as well as on the static parameter cost, they 
play entirely different roles as far as how they enable or disable the transition 
is concerned. The values of static parameters are given prior to the execution 
of the SPP, whereas the control signals are disabled/enabled on a step-by-step 
basis by the controller as the computation proceeds. 

Note that we allow control signals a to participate in the construction of 
triggering conditions. This is different from the controller definition in [13] in 
which at any moment at most one controllable transition can be enabled. 

Definition 2. (State): A state of SPP A = {Q, qo, X, H, fi, E,t,tt) is a pair 
{q, v) such that q G Q and v is a, mapping from X to (i.e., represents the 
current clock readings). Let Ua be the state set of A. || 

Definition 3. (Controller): Given an SPP A, a controller x is a Boolean func- 
tion Ua I— *■ {true, false} which intuitively denotes the action on the control sig- 
nal (i.e., a) to enable or disable transitions according to the current states. Since 
the controller does not depend on the history, it is also a simple controller. || 
We write a^is) to denote the truth value of a at state s under controller x- (If X 
is clear from the context, (j(s) is used as a shorthand.) Apparently, it is useless 
to enable a transition in a mode other than the source mode of the transition. 
Thus, at a given state {q, v), it is reasonable to only consider controller x such 
that X disables all the transitions whose source nodes are not in mode q. It 
should be noted that the same SPP may generate different computations under 
different controllers and interpretations of its parameter variables. 

Definition 4. (Interpretation): An interpretation X for H is a mapping from 
Af U H to Af such that for all c G Af, X{c) = c. |j 
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Definition 5. (Satisfaction of State Predicate): A state {q, v) satisfies state 
predicate iq under controller % and interpretation X, written as (g, u) \=j rj, iff 

• ( 9 : false; 

• {q, v) \=j a iff x(( 9 : ^)) (in words, a is enabled by x at (g, v)); 

• ( 9 , hi a; - y ~ c iff I'ix) - v{y) ~ c; 

• {q, v) hi I ~ c iff v{x) ~ c; 

• {q, v) hi I] OiOi ~ c iff X] ail{ai) ~ c; 

• (y, v) hj m V m iff (g, hi Vi or (y, ly) hj ^ 2 ; and 

• (9: hi iff ( 9 , hi yi- 

If for all X) we have {q, v) h| y, then we may write (y, v) hi V- If for all X, we 
have (y, v) hi then we may write (y, i^) h V- II 

Definition 6 . (Transitions): Given two states {q, ly), {qf v'), there is a mode 
transition from ly to ly' in A under controller x and interpretation X, in sym- 
bols {q, v) (h, v'), iff (g, q') G A, (g, v) h| y(g) A r(g, g'), (g', v') hi ^(V). 
Vi G n{q,q'){iy'{x) = 0), and Vi ^ 7 r(g, g')(i^'(i) = i^(i)). In words, for the 
transition (g, v) (g', v') along the edge (g, g') to take place under controller x 
and interpretation X, it must be the case that the starting and ending invariance 
conditions (i.e., /i(g) and y{q'), respectively) hold in modes g and g', respec- 
tively, and the associated triggering condition r{q, q') is met as well. Meanwhile, 
all the clocks specified in 7r(g, q') are reset to zero, while the remaining clock 
readings remain unchanged. (That is, transition firing is assumed to take place 
instantaneously.) || 

For ease of expression, given a state u and a <5 G , we let (g, u) + 5 = 
(g, V + 5) be the state that agrees with (g, v) in every aspect except for all 
X G X, v{x) + S = {ly + S){x). 

Definition 7. ((g, iz)-Run of Controlled and Interpreted SPP): An in- 
finite computation of A = (Q, go, A, H, n, E, r, tt) starting at state (g, v) un- 
der controller x and interpretation X is called a {q,i')-run and is a sequence 
((gi,i^i,ti),(g 2 ,i^ 2 ,t 2),--0 such that 

• g = gi and v = vi; 

• for each t G there is an z G Af such that U>t (meaning that the run is 
diverging); 

• for each integer z > 1 and for each real 0 < <5 < tz+i — U, {qi, Vi) + S \=^ y{qi) 
(meaning that the invariance condition /i(gz) continuously holds throughout 
the time interval [ti,ti+i])\ and 

• for each z > 1, A goes from (g^, Vj) to (gz-i-i, Vi+i) because of 

- a mode transition, i.e., U = tt+i A {qi, Vi) (gz-i-i, J^z-i-i); or 

- time passage, i.e., U < A {qi, Vi) + U+i - U = (gz+i, Vi+i)- II 



2.2 Parametric Computation- Tree Logic 

Parametric Computation Tree Logic (PCTL) is used for specifying the design 
requirements and is defined with respect to a given SPP. Suppose we are given 
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an SPP A = (Q, qo, X, H, fi, E, A PCTL formula (f) for A has the following 
syntax rules. 



(f) rj \ (f>i\/ (j )2 \ | 3(f>iUr^g(f>2 \ y(j)iU^e4>2 

Here rj £ B{Q, X, H), (j)i and (j >2 are PCTL formulas, and 6 £ H. Note 
that mode names are used as basic propositions for the specification of timely 
mode changes. 3 means “there exists a computation.” V means “for all com- 
putations.” 4 >iU^e 4>2 means that along a computation, 4>i is true until 4>2 be- 
comes true and <f >2 happens with time ~ 6. For example, in a requirement like 
cost = deadline-l-5 A V(7o^<deadiine9i, parameters “cost” and “deadline” are re- 
lated and we require that for all computations, go is true until q\ becomes true 
in “deadline” time units. 

The parameter variable subscripts of modal formulas can also be used as 
parameter variables in SPP. Also we adopt the following standard shorthands: 
true for -^false, (j)\ A (p 2 for V (^</' 2 )), 4>2 for V 4>2, ^Or^ 0 (pi 

for 3true U^e4>i, for for ytrueUr^g(j)i, 3U^g(j)i for 

With different controllers and interpretations, a PCTL formula may impose 
different requirements. 

Definition 8. (Satisfaction of PCTL Formulas): We write in notations 
(g, v) \=j (j) to mean that (f) is satisfied at state (g, v) in A under controller y and 
interpretation X. The satisfaction relation is defined inductively as follows. 

• The base case ofcp £ B{Q, X, H) was previously defined except that (g, v) \=j 
g' iff g = g'; 

• (g, v) \=^ (j)i V (j )2 iff either (g, v) \=^ (j)i or (g, v) \=^ 

• (9) v) hi iff (?: hi 

• {q,v) \=^ (3(/)iZY.^e())2)iffthereexista(g, i/)-run= ((gi, ti), (g 2 , ^ 2 ), ■ ■ ■) 

in A, an z > 1, and a <5 G [0, — U], s.t. 

- ti 3~ 5 ^ t\ -\- T(0), 

“ (gij ^i) + 6 |=j (f)2, 

- for all j, 5' s.t. either (0 < j < z) A {5' £ [0, tj+i — tj\) or (j = z) A {S' £ 

hh)), + hi 4>i- 

(In words, there exists a (g, iz)-run along which (f )2 eventually holds at some 
point in time (~ t\ +X{9)) in the time interval for some z, and 

before reaching that point (pi always holds.) 

• (g, v) \=^ {y4>iU^04>2) iff for every (g, z/)-run = ((gi, viAi), ( 92 , 1 ^ 2 , ^ 2 ), ■ . .) in 

A, for some z > 1 and S £ [0, — ti], 

- ti 3~ S ^ ti X{9), 

~ (gZj l^i) + S 1=1 (f)2i 

- for all j, S' s.t. either (0 < j < z) A {S' £ [0, tj+\ — tj]) or (j = z) A {S' £ 
hh)), {qj,’^j) + S' hi </*i- 

Given an SPP A, a PCTL formula p, a controller y, and an interpretation X for 
H, we say A is a model of p under y and I, written as A hi </'> iff ( 90 ) 0) hi ^ 
where 0 is the mapping that maps all clocks to zeros. || 
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2.3 Formal Definition of Problem 

A performance measure is just a linear expression like ^ OiOi where the afs are 
integers (negative or nonnegative) and the afs are parameters in JI. It represents 
a metric that the users want to maximize in their system design. 

Definition 9. (Parametric Optimization Problem): Given an SPP A, a 

PCTL formula (j), and a performance measure A {—^aiai), the parametric op- 
timization problem instance for A, <f>, and A, denoted as PO{A, <f>, A), is formally 
defined as the problem of deriving the value max{^ aiX{ai) \ 3y(A \=f- f))} if it 
exists. |j 



3 Algorithm 

Our algorithm consists of two steps. First, we extend the parametric analysis 
algorithm for computer systems [17, 18] with controller-choice information. The 
modified algorithm can then generate a constraint describing the sufficient and 
necessary condition of a controller (with parameters) for a given PO(A, (/>, A). 
The second step then uses various techniques in linear algebra to derive the 
maximum of A under the constraint. 

For the first step, we shall define controlled region graphs (CR-graphs) and 
controlled path characterization ( CP -characterization) for parametric analysis of 
controllable timed automata. CR-graphs are like the parametric region graphs 
introduced in [17, 18]. They are also similar to the region graphs defined in [1] 
but contain parametric information. A region is a subset of the state space in 
which all states exhibit the same behavior with respect to the given SPP and 
PCTL formula. 

CP-characterization is derived for each pair of regions in a CR-graph. For 
each t G M, it gives a sufficient and necessary condition for the existence of a 
finite run of t time units from the source region to the destination region. CP- 
characterization will be useful for the construction of constraints associated with 
existential path quantifiers (3). 

We need the following three types of integer set manipulations. Given Ti , T 2 C 

A/", 

• Ti U T 2 means {a j a G Ti or a G T 2 }- 

• Ti-\-T 2 means {oi -I- 02 | oi G Ti; 02 G T 2 }- 

• Ti* means {0} U UieA^ where means the addition of 

i consecutive Ti. 

As we shall see later, the notion of semilinear sets^ is crucial in our algorithm of 
constructing the CP-characterizations. In fact, it will be shown that all integer 
sets resulting from the above manipulations in our algorithm are semilinear. 
Semilinear expressions are convenient notations for expressing infinite integer 
sets constructed regularly. They are also closed under the three manipulations. 

^ A semilinear integer set is expressible as the union of a finite number of integer sets 
like {a -I- 6iji + • • • + b„jn \ ji, - ■ ■ ,jn G N} for some a, 61 , . . . , 6„ G Af. 
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There are also algorithms to compute the manipulation results. Specifically, we 
know that all semilinear expressions can be represented as the union of a finite 
number of sets like a + c* (a shorthand for {a + c* h \ h > 0}). Such a special 
form is called periodical normal form (PNF). It is not difficult to prove that given 
operands in PNF, the results of the three manipulations can all be transformed 
back into PNF[15]. Due to page-limit, we shall skip the details here. 

3.1 CR-Graphs 

The classic concept of region graphs was originally discussed and used in [1] for 
verifying dense-time systems. Our CR-graphs are extended from region graphs 
with constraints on parameter variables. Beside parameter variables, our CR- 
graphs have an auxiliary clock k which gets reset to zero once its reading reaches 
one. The reading of k is always between 0 and 1, that is, for every state {q, v), 
0<i^(K)<l.Kis not used in the user-given SPP and is added when we construct 
the regions for the convenience of parametric timing analysis. It functions as a 
ticking indicator for evaluating timed modal formulas of PCTL. ^From now on, 
we shall assume that k G X. 

The timing constants in an SPP A are the integer constants c that appear 
in conditions such as x — y ^ c and x c in A. The timing constants in a 
PCTL formula (j) are the integer constants c that appear in subformulas like 
X — y ^ c,x ^ c,34>iU^c4>2, and \/(j)iUr^c4>2- Let Ca-. 4 > be the largest timing 
constant used in both A and 4> for the given PO(^, (j), A). 

Given a state {q, v), {q, v) ^ {fr{K) = 0) iff v{k) is an integer. 

Definition 10. (Regions): A region of a state (g, v) for PO{A, (/), A) is a pair 
(g, [v]) such that [v], called clock region of v, is the notation for the set of timing 
inequalities characterizing v, that is, 

x,yeX- ) 

0 < c < > U {/r(/t) = 0} 

n{x) - v{y) ~ c J 

where c is a non-negative integer. Given a {q,v), we shall say {q,v) G {q, [v]). 
Specifically, [ 0 ] is the clock region of mapping 0 . || 

Our region definition resembles the one in [13]. Interested readers are referred to 
[1] for an alternative definition of regions based on the notion of region equiva- 
lence. 

Definition 11. (Controlled Region Graph, CR-Graph): The CR-graph 
for a PO{A, (j), A) with A = {Q, qo, X, PI, y, E, r, tt) is a directed labelled graph 

Ga: 4 > = {V, F) such that the vertex set V is the set of all regions and the arc set 

F C V X V consists of the following two types of arcs {v, v') with v = {q, A) and 
v' = {q',A'). 

• [mode transitions] (v, v') G F iff for all {q, v) G v, there are some y, X, 
and {q' , v') G v' such that {q, v) {q' , v'). 

• [time passage] q = q' and for every state {q, v) G v, there is a state (q, 12') G 
v' such that 



X G X\v{x) ~ c 
0 < c < Ca: 4 >\ 



\J <x-y ' 
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- + S = {q, v') for some 5 G and 

- there is no <5 G 0 < 6 < 6, s.t. {q, t^) -I- <5 is not in v or v' || 



According to [6, 7, 13], in the controller synthesis problem of timed automata, 
there is a controller which is a function of states iff there is a controller which 
is a function of regions. So we shall follow their approach and treat a simply 
as a function of regions. We want to derive arc constraints on static parameters 
and controller choices in CR-graphs so as to construct CP-characterizations. For 
convenience, given a region v = (q, A) and a state predicate rj, we write v{rj) for 
the static state predicate and synthesis decision extracted from rj according to 
the following rules. 

• V (false) is false 

• v(cr) = a(v) 

• v(x — y ~ c) is true if a; — y ~ c G A; or false otherwise. 

• v(x ~ c) is true if a; ~ c G A; or false otherwise. 

• u(X] Oitti ~ c) is X] ~ c. 

• v(r]i V 772 ) = v(r]i) V v(rj 2 ). 

• vi^qi) = ^(yi). 

A mode transition arc (v, v') G F with v = (q, A),v' = (q' , A') iff (1) y(y) A 
r(y, q') is satisfied at all states in v, ( 2 ) after the transition, Ap 67 r (9 q')(A) P 

is satisfied at all states in v' , where 7r(y, q') (A) is the new clock region identical 
to A except all clocks in Tr(q,q') are reset to zeros. The two conditions can be 
formulated as follows. 



xtiouiy, v') 



( ^ (/\peA pAfi{q) A r(y,y')) 

p' A h(a') A p) 



Note that formulas like Ape/i P ^ P' 9') extract the truly active part in 
/i(y) A T(q, q') in region (q, A). Then the applications of u() and u'() extract the 
constraints on static parameters as well as on the control symbol a, should they 
exist. 

The constraints on time passage arcs hinge on the basic constraints on re- 
lations between A and A' which are free of static parameters. Such constraints 
can be determined with standard techniques like symbolic weakest precondi- 
ton calculation [12]. Thus we shall assume the availability of such a procedure 
timed(A, A') which is true iff A, A' are related in the region time passage relation. 
Thus a time passage arc (v,v') G F with v = (q,A),v' = (q',A') iff (1) q = q'; 
(2) timed(A, A'); (3) y(y) is satisfied at all states in v, v' . For convenience, we let 



timed(v,v') = (q = q')Atimed(A, A')Av 



A Ape^ p) Av' (y(y') A Ap-g^/ p') 



3.2 CP-Characterization 

Given a PCTL formula 4>i and a path F = {v\V 2 ■ ■ - Vm) with Vi = (y^, Aj), F 
is called a ifi-path iff for some controller y and some interpretation I, we can 
embed a valid computation in the path such that along the path, all regions 
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Fig. 2. Central Operation in Uur Kleene’s Closure Algorithm. 



except the last one satisfies (j)i- Likewise, a cycle F = {v\V2 ■ ■ ■ fi) is a (pi-cyde if 
all regions along the cycle satisfy (pi. The (/)i-path (cycle) is of t time units long 
iff along the path (cycle), exactly t arcs have the reading of k increments from 
a noninteger to an integer. 

To represent CP-characterizations, we shall use pairs like (77, T), where 77 is a 
state predicate and T is an integer set. Suppose, for each v G V, the constraint for 
subformula pi satisfied by states in v is notationally L[pi]{v). For v,v' gV and 
a subformula pi, the notation for CP-characterization for v,v' is CP[pi]{v,v'). 
Conceptually, there is a finite (()i-path vi, . . .,Vm, with Vi = (qi,Ai), of t time 
units long iff there is a (77, T) G CP[pi]{vi,Vm) such that t G T, the path is a 
valid computation with respect to 77 which specifies the satisfiable constraints on 
parameters and controller-choice, and the controller choice is consistent at the 
replicated regions in the path. 

Now we shall give a procedure for the derivation of the CP-characterization 
for each pair of regions. The kernel of the procedure is a Kleene’s closure com- 
putation with an intuitive scheme of vertex- bypassing. Suppose we have three 
regions u, v and w whose connections in the CR-graph are shown in Figure 2 . By 
bypassing region v, we realize that CP[pi]{u,w) should be a minimal superset 
of P[[pi]{u, V, w) equal to 



{< 



t;i A 772 A /\ 



(V3,T3)eD 



773 , Ti -I- T2 + ^ 



(V3,T3)eD 



Ta* 



(771, Ti) e CP[Pi]{u,vy, 
(772, T2) G CP[Pi]{v,w)-, 
D C CP[pi]{v,v) 



considering all intermediate nodes v. Note that in the calculation of CP[pi] {u, w), 
the time set component T always remains semilinear. 

The procedure for computing CP[pi]() is presented in the following. 



KClosure[(()i](y, F) /* F C F. It is also assumed that for all regions v G V, we 
know constraint L[pi]{v) which makes pi satisfied at every state in v. */ 

{ 

For each (u, w) ^ F, CP[pi]{u, w) := 0 ; 

For each (u, w) G F with u = ((/„, Al„) and w = {q^, A^), do { 
let 77 := L[pi]{u) A {xtion{u, w) V timed{u, w)). 
if fiin) ^0 G Au and fr{K) = 0 G Ayj, CP[pi]{u, w) := {(77, 1 )}; 
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else CP[(j)i]{u,w) := {(? 7 , 0)}. 

} 

For z := 0 to |y|, do 

Iteratively for each z; S F, do 

for each u,w G V, let CP[(f>i]{u, w) := CP[(f>i]{u, w) U Pf[(f>i]{u, v, w); 



The first two for-loops are for the purpose of setting up the initial values 
for paths of length one (i.e., directly connected edges). Notice that one unit of 
time is charged to an edge should k change from nonzero to zero. One important 
thing in the design of KClosure[]() is to ensure that the controller always makes 
consistent decision in each region. This is enforced by the individual controller- 
choice constraints for each arc along the path. More precisely, if, for example, a 
is set to ‘true’ at some point in time for a region u, and set to ’false’ later in the 
computation for the same region, then both a(u) and ^a(u) appear (conjunc- 
tively) in the predicate characterizing the path, guaranteeing that all satisfiable 
paths be controller-choice consistent. Lemma 1 establishes the correctness of 
KClosure[(/)i](). 

Lemma 1. Suppose we are given the labeling function for a PCTL 

formula 4>i on {V, F) and a natural number t € Af. After running algorithm 
KClosure[4>i]{V, F), the following two statements are equivalent. 

1. there is a {rj, T) S CP[())i](z;, v') such that rj is satisfiable and t G T; 

2. there is a computation from a state {q, v) € v to a state in {q' , v') G v' under 
some controller x o.'nd some interpretation F of t time units such that 4>i is 
satisfied in all but the last state during the computation. 

Proof Sketch: The forward direction from item 1 to item 2 is easy to prove. 
The backward direction relies on a proof to show that the choice-consistency 
constraint (logically i.e., (p.{q) A Ape/iP) is sufficient. In the jargon of 

[13], we need to show that the existence of C'^,,^-polyhedral solution controller is 
a necessary condition for the existence of any solution controller. The proof idea 
is to transform a non-C'y 4 : 0 -polyhedral solution controller to a C'y 4 : 0 -polyhedral 
one. Then we can prove the new controller also satisfies the same set of modal 
formulas, as the old one, by structural induction on formulas. jj 



3.3 Nonzenoness 

Zenoness is an undesirable anomaly in real-time computations such that clock 
readings converge to finite values. Certainly, we do not want such an anomaly 
sneaks in the constraints derived for the existence of interpretations and con- 
trollers. To avoid zenoness, we shall adopt the same approach used in [16]. A 
state is nonzeno iff from that state on, there is always a computation along which 
K gets reset infinitely often. In PCTL, that is for some (f>j yf false. This 

can be expressed as the following constraints on regions. 
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L[3a>o<^j](n) — (yiy (v,T)eCPl<i,j]((K)v,u)'^') ^ (V(,,,T)eCP[.^.j](u,u)(’? ^ > *^))) 

where {k)v is the region in a CR-graph that agrees with v in every aspect except 
that if {k)v = (q,A), then /r(p) = 0 G R. The constraint essentially says that 
from {k)v, we can reach a cycle of nonzero time. 

3.4 Labeling Algorithm 

Once the CP-characterizations for (j)i have been constructed successfully, we can 
then turn to the labeling algorithm to calculate the parametric conditions for 
the modal formulas properly containing (f>i. However, there is still one thing 
which we should define clearly before presenting our labeling algorithm, that 
is: “How should we derive parameter constraints from things like (rj, T) in CP- 
characterizations?” Suppose, we want to examine if from v to v' , there is a run 
with time > 6. To do this, we define semilinear conditions in the form of T ~ 0 
with semilinear expressions T in PNF, and the (numerical or variable) parameter 
6 is calculated according to the following rewriting rules. 

• a + c* ^ 9 a + cj ^ 6 where j is a new integer variable never used 

before. 

• Ti U T2 ~ ^ (Ti ~ 6<) V (T2 ~ 9). 

Note that since we assume that the operands are in PNF, we do not have to pay 
attention to the case of -I- and *. Then, the condition that there is a run with 
time > 9 from v to v' can be calculated as \J T)&CP[<f>i]iv v')9 

In the following, we present the labeling algorithm for L[<j)\(v) in the tradi- 
tional inductive case analysis of formula 4>. 



Labei(H, 0) { 

(1) construct the CR-graph Ga-, 4 , = 

(2) return L[0]{{qo, [0])); 

} 



L[0i]{v) /*v = {q,A) */{ 
switch((/)i){ 

case (false), L[false](v) := false; 

case (0i = q') where q' G Q, L[q'](v) := true ii q = q' , else L[q'](v) 

case (a; — y ~ c or a; ~ c), L[0i\(v) := true iff 0i G A; 

case (X) cLitti ~ d), aiO^i ~ d](v) := ~ d; 

case (0j V 0k), L[0j V (j>k](v) ■= L[0j](v) V L[0k](v); 

case (^y), L[-^ri\(v) := ^L[ri\(v); 

case (3n>o<()j), { 

(1) KClosure[(/)j](y, F); 




false; 
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} 



case 

(1) 

( 2 ) 

} 



{3(j)jUc0(j)k) { 
KClosure[(/)j] (V,F); 







L[(l)k]iu) A L[3ayotrue]{u) 

(.v,T)eCP[<i>j]({K)v,u) 




case { 3 (j)jU> 0 (j)k, 3 (j)jUy 0 (l>k, 34 >jU^ 04 >k, 3 (j)jU^ 0 (j)k) can be treated similarly as 
the last case. 



case {\/(j)jU< 0 (j)k), { 

(1) KClosure[(/)j](y, F); 

{2) L[y(j),U<0(l)k]{v) := 

( L\3{-^<t)k)U<0^{(t)jy <t)k)]{{p)v) 

/ /r(K) = 0 G All A Mk) 0 G AI 2 
- (a, /lii G lA ALh(/)fc](ui) AF[3D>otrMe](u2) 






} 



\ 



case {y4>jU^0(j)k, y(j)jU>0(j)k, y(j)jU^0(j)k, y4>jU^0(j)k) can be treated similarly as 
the last case. 



} 



There are two things worthy of mention in the algorithm. First, nonzenoness 
is properly handled because we require in the algorithm that all computations 
have a suffix computation satisfying 3D>o(...). Second, the case y(j)jU< 0 (f>k is 
handled as the negation of the existence of two types of counter examples. The 
first counter example type 3{-^(j)k)U<0^{4>j V 4>k) says that (f>k is not fulfilled 
before (f)j becomes false in time < 6. The rest is for the second counter example 
type which says that along some computation, (f)k is never true in time < 6. Note 
that to characterize the interval which stop right at integer 9, we need constraint 
fr{n) = 0 G All A /a(k) yf 0 G AI 2 . 

Example 2. A Simple CR-Graph. For the automaton in Figure 1, the associ- 
ated region graph is shown in Figure 3. To succinctly represent a region, we shall 
only put down the (true and false) mode names and those indispensable inequal- 
ities as a conjunction. All those inequalities which can be deduced from others 
are omitted. Also for ease of explanation, some of the edges of the region graph 
are annotated with |, J,, or constraints (over a and parameters) under which the 
associated transitions become enabled. An arc {v, v') is annotated with if 
V /a(k) = 0 and v' \= fr^n) = 0; “J,” if u ^ /a(k) = 0 and u' ^ /a(k) = 0. 

Note that to make a valid characterization, 9 > 1 AVn<g(7o must be satisfied. 
This means that the transition from region vi to V 2 must be disabled while the 
one from V 3 to V 4 must be enabled. In turn, this means that a must be false 
atgA0<a;<l and true at g A a; = 1 . With the region graph, we can derive 
relations xtion{) and timed{). 
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\q A X — 0 A fr{K) — 0 



'^1 I 0 < a; < 1 A cost > 1, 

qA0<x<lA fr{K.) ^ D 



A. 



-^gA0<a:<lA /r(«) ^ 0 



^q A X > 1 A fr{K) — 0 



A > 

V4 1 


= 1 


V 3 


^q A X > 1 A fr{K) ^ O' 




—<q A X — 1 A fr{K.) — 0 



Fig. 3. A Simple Region Graph to Illustrate the Algorithm. 



Table 1. Computation of KClosure[](). 



{timed{vo, Vl) , 0 ) G CF[true\(vo, vi) 
{timed{vi, V3) , 1) G CP[tru^{vi^V3) 
{xtion{v3, V4),0) G CP[true]{v3, V4) 
{timed{v3, vq) , 1 ) G CP[true]{v3, vq) 




(xtion{vi , V2) , 0 ) G C'P[trueJ{vi , V2) 
{timed{v2, V4), 1 ) G OP[true]{v2, V4) 
{timed{v4, V5) , 0 ) G OP[true]{v4, V5) 
{timed{v6, V5), 0 ) G OP[true]{v6, V5) 


(1) 


{true, 0 ) G CP[true\{vo,vi) 

{true, 1) G CP[tru^{vi,V3) 

(cr(?j3) A cost > 10 , 0 ) G CP[tru^{v3,V4) 
{true, 1) G CP[true]{v3,VQ) 




(cr(?Ji) A cost > 10 , 0 ) G CP[true\{vi, V2) 
{true, 1 ) G CP[true]{v2,V4) 

{true, 0 ) G CP[true]{v4,V3) 

{true, 0 ) G CP[true]{vQ,V3) 


(2) 


(fr('Ui) A cost > 10 , 0 ) G CP[true\{vo, V2) 
) A cost > 10 , 1 ) G CP[true] {vi,V4) 
1) G CP[tru^{v2,V3) 

{true, 1 ) G CP[true]{v4,VQ) 

{true,l*) G CP[true]{vQ,VQ) 




{true,l) G CP[true\{vQ,V3) 

((t(i!3) a cost > 10 , 1 ) G CP[true] {vi,V4) 
(cr(?;3) A cost > 10 , 0 ) G CP[true\{v3, V3) 
{true,l*) G CP[tru^{v3,V3) 


( 3 ) 


(cr(i’i) A cost > 10 , 1 ) G C'P[true\{vQ, V4) (tT(i!3) A cost > 10 , 1 ) G C'P[true\{vo, V4) 

(it(?ji) a cost > 10, 1 + 1 *) G CP[tru^{vi , U5) (ct(?J 3) A cost > 10, 1 + 1 *) G CP[tru^{vi . 

(<t(v3) a cost > 10, 1 + 1 * + 1 *) G OP[true]{v3 , VQ){true, 2 + 1 * + 1 *) G OP[true]{v2 , vq) 


i« 5 )( 4 ) 





timed{vo, vi) = true; xtion(vo, vi) = false; timed{vi, V2) = false; 



xtionfvi, V2) = 



Vl 



qo A -<qi A 0 < ® < 1 
^/\fr{K) / 0 A (T A cost > 10 ^ 
\AV 2 {-<qo /\qi/\0<x<lA frA) Q) j 
timed{v\, V3) = true; xtion{v\, V3) = false; timed{v2, V4) = 
timed(v3, V4) = false; 



xtion{v3, V4) = 

timed{v 4 , vs) = 
timed(ve, vs) = 



V3 



^qo A gi A ® = 1 A frA) — 0 
j^Acr A cost > 10 

yAt)4(“'5o A gi A a; = 1 A frA) = 0) 

"rue; xtionAi, vs) = false; timedAs, ve) = 
"rue; xtionAe, vs) = false 



= aAi) A cost > 10 
true; xtion{v2, V4) = false; 

= cr{vs) A cost > 10 
true; xtionfvs, ve) = false; 



After running algorithm KClosure[(7]() on the region graph, we find that the com- 
putation of membership relations is that shown in Table 1 . In the table, we group 
the formulas into rows with horizontal lines to make it more readable. The first 
two rows are set up for length one paths, while the remaining rows are obtained 
with the transitivity (by-passing) law. In the third row, because of the time 
I self-loops on regions V5,vq, we can deduce that (true,!*) G CP[true]{v5jV5), 
which means that we can cycle through region V5 for an arbitrary number of 
times. II 
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In the following discussion, our labeling algorithm is run on a small example to 
present the idea. 

Example 3. A Test Run of the Labelling Algorithm. We illustrate our al- 
gorithm on the automaton shown in Figure 1 and PCTL specification cost < 
100 A 6* > 1 A Vn<g(7o- The region graph is shown in Figure 3. We first have the 
following derivation. 

cost < 100 A 6* > 1 A Vn<g(7o 
= cost < 100 A 0 > 1 A ^30<e((^(7o) A BOyQtrue) 

According to our labelling algorithm, the characterization formula is 
L[cost < 100]((k)wo) a L[9 > l]((K)fo) 

A- (Lh9o](u) A L[3a>otrue]{u) A V(,,,T)GCP[trMe]((K>«o,«) iv^T< 6»)) 

~^qo is true only at V 2 ,V 4 ,V 5 ,vq and 3n>otrrte is true at all these four regions. 
Thus we have 

/ (v,T)eCPltrue](vo,v2)^3 > T)\^ 

^ V (7j,T)eCP[true](vo,vi) iv ^ 9 > T) 

^ y (r],T)eCP[true](vo,v5) iv ^ 9 > T) 
y (4,T)eCP[true](vo,vs) (9/^9 >T) J 
/ (t(vi) a cost > 10 A 0 > 0 \ 

V a(vi) A cost > 10 A 0 > 1 

V (t(v 3 ) a cost > 10 A 6 > 1 

V a(vi) A cost > 10 A 6 > 1 -|- 1* 

V a{vz) A cost > 10 A 0 > 1 -I- 1* 

V a(vi) A cost >10A6>2-|-1* -fl* 

V V cr(v 3 ) A cost >10A6>2-|-1*-|-1*/ 

= cost < 100 A 9 > 1 A -■ ((cr(iii) A cost > 10 A 0 > 0) V {a{v 3 ) A cost > 10 A 0 > 1)) 

= cost < 100 A 0 > 1 A (-icr(ni) V cost < 10 V 6 < 0) A (-<cr(v 3 ) V cost < 10 V 0 < 1) 

= cost < 100 A 0 > 1 A (^a(vi) V cost < 10) A (^a(v 3 ) V cost < 10 V 0 < 1) 

This formula says that to make the solution existent, the discrete transition must 

be disabled with either cr(vi) = false or cost < 10 at mode qo when 0 < a; < 1 
is true. Furthermore, according to the last disjunction, if there is going to be 
a nonzeno computation, then it is necessary that a{v 3 ) = true, cost > 10, and 

9 = 1 . II 



cost < 100 A 9 > 1 A ' 



= cost < 100 A 9 > 1 A 



The following theorem establishes the correctness of our labeling algorithm. 
The proof parallels that of a similar algorithm presented in [17]. 

Theorem 1. Given a PO(^, (/), A), a subformula 4>i, and v = {q,A), after ex- 
ecuting L[<f>i]{v) in our labeling algorithm, L[<f>i\{v) is satisfiable iff for some X 
and for some for any {q, v) S v, {q, v) \=f- <f>i. || 



3.5 Parametric Optimization Step 

Given an SPP A = (Q, qo, X, H, fi, E, t, tt) and a PCTL formula <j), our labeling 
algorithm Label{A,<f>) returns predicate L[(j)\{{qo, [0])) (a constraint on parame- 
ters and controller choices) in such a way that L[(f]{{qo, [ 0 ])) is satisfiable iff for 
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some interpretation X and some controller (g, 0) (f). Then the constraint 

on parameters and controller choices is fully characterized by [ 0 ]))- In 

the following we shall demonstrate how to process [ 0 ])) to solve our 

parametric optimization problem. 

Step 1: L[(/)]((( 7 o 7 [0])) can be rearranged into the disjunctive normal form 

V A 2 V ... V Ara, for some m. Each of the conjunctions can be further rear- 
ranged to two types of atoms: (type 1) ^aiUi ~ c and (type 2) a{v). Type 2 is 
for consistency of controller choices. A conjunction is satisfiable iff the subcon- 
junction of its type 1 atoms is satisfiable and the subconjunction of its type 2 
atoms is also satisfiable. The satisfiability of conjunctions of type 2 atoms can 
be solved in the standard BDD or DBM technologies. L[(j)]{{qo, [0])) remains the 
same if those conjunctions with unsatisfiable subconjunctions of type 2 atoms 
are eliminated. 

Step 2: Assume that L[(/)]((( 7 o 7 [0])) has no conjunctions with unsatisfiable 
subconjunctions of type 2 atoms. The constraint on parameters for the existence 
of any controllers is thus 

^ “ Vi<i<m;4i is the subconjunction of type 1 atoms of Ai 

Then the parametric optimization problem can be broken down to m subprob- 
lems which ask for the maximum of the objective function on linear inequality 
systems Ai. Thus, the parametric optimization problem of controllable timed 
automata is reduced to that of integer linear programming, which is reason- 
ably well-studied in the literature, although the size of the linear programming 
instance is likely to be exponential in the worst case. The answer to our opti- 
mization problem is the maximum of the answers to the m subproblems. 

Example 4- For the automaton, specification, and optimization metric (cost — 0) 
to maximize in example 1, the optimization metric is max(cost — 0) = 10 — 0 = 10 
when only zeno computations exist; or max(cost — 9) = 99 — 1 = 98 when at 
least one nonzeno computation exists. Thus the optimization metric is 98 with 
respect specification phi and metric cost — 9. 

After running our algorithm for the second set of specification (j)' and metric 
—9, we find that the optimization metric value —9 is —1, indicating that 1 is the 
minimum deadline from a go state to a q\ state subject to the restriction of (j)' . 
Due to page-limit, we leave the details to the readers. || 



4 Conclusion 

We have investigated the parametric optimization issue of real-time systems 
modeled by controllable timed automata augmented with static parameters. An 
algorithm has been proposed for deriving constraints over the static parameters 
as well as the synthesized controller that would provide an environment in which 
the system functions correctly. To the best of our knowledge, our work is the 
first in an attempt to investigate parametric analysis, controller synthesis and 
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parametric optimization in a unified setting. By giving a complete characteri- 
zation of the controller as well as the parameter valuations (satisfying a given 
specification) in terms of a set of linear inequalities, parametric optimization is 
then carried out in the framework of integer linear programming, which is rela- 
tively well-studied. The efficiency issue is one thing that has not been addressed 
much in this work. As region graphs of timed automata, in general, are expo- 
nential in size, we expect our algorithm to take exponential time in the worst 
case. One way to circumvent this inefficiency is to look into the possibility of 
incorporating the so-called symbolic techniques, which have been proven to be 
useful for controller synthesis (see, e.g., [6]). Analyzing the computational com- 
plexity of our algorithm (as well as the problem) and subsequently improving 
the algorithm (perhaps, based on symbolic approaches) are among our future 
research of parametric optimization. 
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Abstract. Model checking would answer all finite-state verification problems, if 
it were not for the notorious state-space explosion problem. A problem of practi- 
cal importance, which attracted less attention, is to close open systems. Standard 
model checkers cannot handle open systems directly and closing is commonly 
done by adding an environment process, which in the simplest case behaves 
chaotically. However, for model checking, the way of closing should be well- 
considered to alleviate the state-space explosion problem. This is especially true 
in the context of model checking SDL with its asynchronous message-passing 
communication, since chaotically sending and receiving messages immediately 
leads to a combinatorial explosion caused by all combinations of messages in the 
input queues. 

In this paper we develop an automatic transformation yielding a closed system. 
By embedding the outside chaos into the system’s processes, we avoid the state- 
space penalty in the input queues mentioned above. To capture the chaotic timing 
behaviour of the environment, we introduce a non-standard 3-valued timer ab- 
straction. We use data-flow analysis to detect instances of chaotic variables and 
timers and prove the soundness of the transformation, which is based on the result 
of the analysis. 

Keywords: Model checking, open reactive systems, data-flow analysis, SDL. 



1 Introduction 

Model checking im is considered as method of choice in the verification of reactive 
systems and is increasingly accepted in industry for its push-button appeal. To alleviate 
the notorious state-space explosion problem, a host of techniques has been invented, 
e.g., partial-order reduction and abstraction , to mention two promi- 

nent approaches. 

A problem of practical importance, which attracted less attention, is to close open 
systems. Since standard model checkers, e.g.. Spin rtza . cannot handle open systems, 
one first has to transform the model into a closed one. This is commonly done by adding 
an environment process that, in order to be able to infer properties for the concrete sys- 
tem, must exhibit at least all the behaviour of the real environment. The simplest safe 
abstraction of the environment thus behaves chaotically. When done manually, this clos- 
ing, as simple as it is, is tiresome and error-prone for large systems already due to the 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 319-12^2001. 
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sheer amount of signals. Moreover, for model checking, the way of closing should be 
well-considered to counter the state-space explosion problem. This is especially true 
in the context of model checking SDL-programs {Specification and Description Lan- 
guage) 031 with its asynchronous message-passing communication model. Sending 
arbitrary message streams to the unbounded input queues will immediately lead to an 
infinite state space, unless some assumptions restricting the environment behaviour are 
incorporated in the closing process. Even so, external chaos results in a combinatorial 
explosion caused by all combinations of messages in the input queues. This way of 
closing is even more wasteful, since most of the messages are dropped by the receiver 
due to the discard-feature of SDL-92. 

Another problem the closing must address is that the data carried with the messages 
coming from the environment is usually drawn from some infinite data domain. Since 
furthermore we are dealing with the discrete-time semantics ICTI of SDL, special care 
must be taken to ensure that the chaos also shows more behaviour wrt. timing issues 
such as timeouts and time progress. 

To solve these three problems, we develop an automatic transformation yielding 
a closed system. (1) By embedding the outside chaos into the system’s processes, we 
avoid the state-space penalty in the input queues mentioned above. (2) We use data 
abstraction, condensing data from outside into a single abstract value T to deal with 
the infinity of environmental data. In effect, by embedding the chaos process and ab- 
stracting the data, there is no need to ever consider messages from the outside at all. 
Hence, the transformation removes the corresponding input statements. By removing 
reception of chaotic data, we nevertheless must take into account the cone of influence 
of the removed statements, lest we get less behaviour than before. Therefore, we use 
data-fiow analysis to detect instances of chaotically influenced variables and timers. (3) 
To capture the chaotic timing behaviour, we introduce a non-standard 3-valued timer 
abstraction. 

Based on the result of the analysis, the transformation yields a closed system S'^ 
which shows more behaviour in terms of traces than the original one. For formulas of 
next-free LTL we thus get the desired property preservation: if S' ** |= then 

S\=ip. 

The remainder of the paper is organized as follows. Section 0 introduces syntax 
and semantics we use, modelling the communication and timed behaviour of SDL. In 
Section 0 we present the data-flow algorithm marking variable and timer instances in- 
fluenced by chaos. Secfion01hen develops the transformation and proves its soundness. 
Finally in Section|3we conclude with related and future work. 

2 Semantics 

In this section, we fix syntax and semantics of our analysis. Since we take SDL ll3Sll 
as source language, our operational model is based on asynchronously communicat- 
ing state machines (processes) with top-level concurrency. A program Prog is given 
as the parallel composition of a finite number of processes. A process P is 

described by a four-tuple ( Var, Loc, cfinit , Edg), where Var denotes a finite set of vari- 
ables, and Loc denotes a finite set of locations or control states. We assume the sets of 
variables Vavi of processes Pi in a program Prog — IIlLiPi to be disjoint. A map- 
ping of variables to values is called a valuation; we denote the set of valuations by 
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Val : Var D. We assume standard data domains such as N, Bool, etc., and write 
D when leaving the data-domain unspecified, and silently assume all expressions to be 
well-typed. E = hoc x Val is the set of states, where a process has one designated ini- 
tial state ainit = {Unit: Valinit) S E. An edge of the state machine describes a change 
of configuration resulting from performing an action from a set Act of actions; the set 
Edg C Loc x Act x Loc denotes the set of edges. 

As actions, we distinguish (1) input of a signal s containing a value to be assigned to 
a local variable, (2) sending a signal s together with a value described by an expression 
to a process P', and (3) assignments. In SDL, each transition starts with an input ac- 
tion, hence we assume the inputs to be unguarded, while output and assignment can be 
guarded by a boolean expression g, its guard. The three classes of actions are written as 
ls(x), g \> P\s{e), and g\> x := e, respectively, and we use when leaving the 

class of actions unspecified. For an edge {I, a, 1) G Edg, we write more suggestively 
I 1. 

Time aspects of a system behaviour are specified by actions dealing with timers. 
Each process has a finite set of timer variables (with typical elements t,t[, . . .) which 
consist of a boolean flag indicating whether the timer is active or not, and a natural 
number value. A timer can be either set to a value, i.e., activated to run for the designated 
period, or reset, i.e., deactivated. Setting and resetting are expressed by guarded actions 
of the form g > set t := e and g > reset t. If a timer expires, i.e., the value of a timer 
becomes zero, it can cause a timeout, upon which the timer is reset. The timeout action 
is denoted by gt t> reset t, where the timer guard gt expresses the fact that the action 
can only be taken upon expiration. 

As the syntax of a program is given in two levels — state machines and their parallel 
composition — so is their semantics. In SDL’s asynchronous communication model, a 
process receives messages via a single associated input queue. We call a state of a pro- 
cess together with its input queue a configuration {cr,q). We write e for the empty queue; 
(s, v)y.q denotes a queue with message (s, v) (consisting of a signal s and a value v) 
at the head of the queue, i.e., (s, v) is the message to be input next; likewise the queue 
q::{s,v) contains (s, v) most recently entered. The behaviour of a single process is then 
given by sequences of configurations {ainit, e) = {ao, go) {ai,qi) . . . start- 
ing from the initial one, i.e., the initial state and the empty queue. The step semantics 
— Q r X Lab X P is given as a labelled transition relation between configurations. 
The labels differentiate between internal r-steps, “iicfc”-steps, which globally decrease 
all active timers, and communication steps, either input or output, which are labelled 
by a triple of process (of destination/origin resp.), signal, and value being transmitted. 
Depending on location, valuation, the possible next actions, and the content of the input 
queue, the possible successor configurations are given by the rules of Table El where 
we assume a given set Sig of signals exchanged with the environment. 

Inputting a value means reading a value belonging to a matching signal from the 
head of the queue and updating the local valuation accordingly (rule Input), where 
T] G Val, and rj[x^v] stands for the valuation equalling q for all y G Var except for 
X G Var, where q[x^v\ (x) = v holds instead. A specific feature of SDL-92 is captured 
by rule DISCARD: if the head of the input queue cannot be reacted upon at the current 
control state, i.e., there is no input action originating from the location treating this sig- 
nal, then the message is just discarded, leaving control state and valuation unchanged. 
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Table 1. Step Semantics for Process P. 



I ^?s(3:) ^ ^ 

: Input 

(s,v) :: q) {l,r][x^v],q) 

I ^ g [> P'[(s,e) ^ ^ [i?l^ tVUE 

(l,q,q) ^pn,(s,v) {i,V,q) 



I — *?s'(x) I £ Edg s' ^ s 



{l,ri,{s,.) :: q) 

leh = V 






Output 



Discard 



V £ D 

Receive 

(i,77,g) ^P7(s,v) {i,V,g ■■ (s,v)) 

I — >go x:=e I G Edg [gL = true feL = v 

; Assign 

(l,V,q) (l,r]lx^v],q) 

I — >g[>set t--e I G Edg Igjr, = truc feL = v 

: Set 

{l,ri,q) {l,V[t^on(v)],q) 

I — >g[> reset t I G Edg [g]g = truc 

: Reset 

{l,V,q) (l,v[tr^off],q) 

I >g^ o reset t ^ G Edg [^1^ “ on(0) 

: Timeout 

{I — i G Edg ^ a ^ gtt> reset t) |f]^ = on(0) 

TDiscard 

{l,g,q) {l,r)[tr^off],q) 



Unlike input, output is guarded, so sending a message involves evaluating the guard 
and the expression according to the current valuation (rule OUTPUT). Assignment in 
Assign works analogously, except that the step is internal. Receiving a message by 
asynchronous communication simply means putting it into the input queue where in the 
RECElVE-rule, P is the identity of the process. 

Concerning the temporal behaviour, timers are treated in valuations as variables, 
distinguishing active and deactivated timer. The sef-command activates a timer, setting 
its value to a specified time, reset deactivates it; both actions are guarded (cf. rules Set 
and Reset). A timeout may occur, if an active timer has expired, i.e., reached zero (rule 
Timeout). 

We assume for the non-timer-guards, that at least one of them evaluates to true for 
each configuration.0 

The global transition semantics for a program Prog = is given by a stan- 

dard product construction: configurations and initial states are paired, and global transi- 
tions synchronize via their common labels. The global step relation ^ \ ^ Px Lab x P 
is given by the rules of Tabled 

' This assumption corresponds at the SDL source-language level to the natural requirement that 
each conditional construct must cover all cases, for instance by having at least a default branch: 
the system should not block because of a non-covered alternative in a case-construct. 
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Table 2 . Parallel Composition. 



(o-l,9l) (o-l,«?l) (cT2,(?2) ^P?(b,v) {& 2 A 2 ) s i Sig^ 

(ai,qi) X (0-2,92) (d-i,9i) X (0-2,92) 

(o-i,gi) (d-1,91) A = {t,P 7 {s,v),P\{s,v) I s G Sig^^t} 



■ Comm 



(0-1,91) X (0-2,92) (0-1,91) X (0-2,92) 

blocked {I, 9, 9) 



Interleave 



(l,T],q) ^tick {l,rilt^{t-i)],q) 



■ Tick 



Asynchronous communication between the two processes uses a system-internal 
signal s to exchange a common value v, as given by rule COMM. As far as r-steps 
and communication messages using external signals are concerned, each process can 
proceed on its own by rule INTERLEAVE. Both rules have a symmetric counterpart, 
which we elide. Time elapses by counting down active timers till zero, which happens 
in case no untimed actions are possible. In rule Tick, this is expressed by the predicate 
blocked on configurations: blocked (j) holds if no move is possible by the system ex- 
cept either a clock-tick or a reception of a message from the outside, i.e., if 7 ^ a for 
some label A, then A = tick or A = P?(s, v) for some s G Sig g^.^. In other words, the 
time-elapsing steps are those with least priority. Note in passing that due to the discard- 
ing feature, blocked {a, q) implies q = e. The counting down of the timers is written 
77[ti-^(t-i)], by which we mean, all currently active timers are decreased by one, i.e., 
on{n -f 1 ) — 1 = on(n), non-active timers are not affected. Note that the operation is 
undefined for on(0), which is justified by the following lemma. 

Lemma 1. Let S be a system and {l,rj,q) G P a configuration. If (I, rj,q) -^uck 
(Z, 77', q), then |f]^ on(0), for all timers t. 

In SDL, timeouts are often considered as specihc timeout messages kept in the in- 
put queue like any other message, and timer-expiration consequently is seen as adding 
a timeout-message to the queue. We use an equivalent presentation of this semantics, 
where timeouts are not put into the input queue, but are modelled more directly by 
guards. The equivalence of timeouts-by-guards and timeouts-as-messages in the pres- 
ence of SDL’s asynchronous communication model is argued for in m. The semantics 
we use is the one described in ILblUI . and is also implemented in DTSpin II8IT/I . a 
discrete time extension of the Spin model checker. 

3 Marking Chaotically-Influenced Variable and Timer Instances 

In this section, we present a straightforward dataflow analysis marking variable and 
timer instances that may be influenced by the chaotic environment. The analysis forms 
the basis of the transformation in Section 0 

The analysis works on a simple fiow graph representation of the system, where 
each process is represented by a single flow graph, whose nodes n are associated with 
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the process’ actions and the flow relation captures the intra-process data dependencies. 
Since the structure of the language we consider is rather simple, the flow-graph can be 
easily obtained by standard techniques. 

The analysis works on an abstract representation of the data values, where T is 
interpreted as value chaotically influenced by the environment and _L stands for a non- 
chaotic value. We write , . . . for abstract valuations, i.e., for typical elements 

from VaZ“ = Var {T, _L}. The abstract values are ordered _L < T, and the order is 
lifted pointwise to valuations. With this ordering, the set of valuations forms a complete 
lattice, where we write r]± for the least element, given as rj± (x) = _L for all x S Var, 
and we denote the least upper bound of yyf , . . . , 77 “ by VlLi 'Hi V 772 in the 

binary case). 

Each node n of the flow graph has associated an abstract transfer function /„ : 
VaZ“ — !■ VaZ“. The functions are given in Table El where an denotes the action associ- 
ated with the node n. The equations are mostly straightforward, describing the change 
the abstract valuations depending on the sort of action at the node. The only case deserv- 
ing mention is the one for ?s(x), whose equation captures the inter-process data-flow 
from a sending to a receiving actions. It is easy to see that the functions /„ are mono- 
tone. 



Table 3. Transfer Functions/Abstract Effect for Process P. 



/(?s(x))77“ 



\ 77“[x«V{W„- 



/(5>P!s(e))7?“ = 77“ 
f{g\>x := e)r]°‘ = 

}{g\> set t := e)r)°‘ = 

f{g\> reset t)ri°‘ = rj°'[t^off] 
f{gt \> reset t)g°- = 



=g [> P!s(e) for some node n'] 



•5 £ 
else 



Upon start of the data-flow analysis, at each node, the variables’ values are assumed 
to be defined, i.e., the initial valuation is the least one: = r]±. This choice 

rests on the assumption that all local variables of each process are properly initialized. 
We are interested in the least solution to the data-flow problem given by the following 
constraint set: 



Vpostin) > fniVprein)) ( 1 ) 

Vpre (n) > V impost in') I (n\ n) in flow relation} (2) 

For each node n of the flow graph, the data-flow problem is specified by two in- 
equations or constraints. The first one relates the abstract valuation ry before entering 
the node with the valuation 7y“os( afterwards via the abstract effects of Table El The 
least fixpoint of the constraint set can be solved iteratively in a fairly standard way by 
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input : the flow-graph of the program 

output: 

5 VpOSt i 

77 “ (n) = r]Zin{n)-, 

WL^{n \ =?s(o;), s G Sig^^tY 

repeat 

pick n G WL\ 

let S = {n' G succ{n) \ fn{v°‘(n) ^ J 7 “(n')} 

in 

for aU n' G S: rj°‘(n') := /(» 7 “(n)); 

WL ■- WL\n U S; 

until WL = %\ 

Vpre{n) = 

Vpostin) = /n(t7“(n)) 



Fig. 1. Worklist Algorithm. 



a worklist algorithm (see e.g. EE3E1), where the worklist steers the iterative loop 
until the least fixpoint is reached. The algorithm for our problem is shown in Fig. 0 

The worklist data-structure WL used in the algorithm is a set of elements, more 
specifically a set of nodes from the flow-graph, and where we denote by succ{n) the set 
of successor nodes of n in the flow graph in forward direction. It supports as operation to 
randomly pick one element from the set (without removing it), and we write WL\n for 
the worklist without the node n and U for set-union on the elements of the worklist. The 
algorithm starts with the least valuation on all nodes and an initial worklist containing 
nodes with input from the environment. It enlarges the valuation within the given lattice 
step by step until it stabilizes, i.e., until the worklist is empty. If adding the abstract 
effect of one node to the current state enlarges the valuation, i.e., the set S is non-empty, 
those successor nodes from S are (re-)entered into the list of unfinished one. Since the 
set of variables in the system is finite, and thus the lattice of abstract valuations, the 
termination of the algorithm is immediate. 

Lemma 2 (Termination). The algorithm of Figure\J\terminates. 

Proof. Immediate, for set of variables Var in the program is finite and hence the lattice 
Vaf“ = Var {T, _L} is finite, as well. □ 

With the worklist as a set-like data structure, the algorithm is free to work off the 
list in any order. In praxis, more deterministic data-structures and traversal strategies 
are appropriate, for instance traversing the graph in a breadth-first manner (see l031l for 
a broader general discussion or various traversal strategies). 

After termination the algorithm yields two mappings 77 Vpost ■ Node Val^^. 
On a location I, the result of the analysis is given by r]‘^{l) — \/ {rjposti^) \ n = I — >a 
1}, also written as 77 “. The definition is justified by the following observation: 
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Lemma 3. Given a location I and a node hfrom the flow graph such that h = I — > a I- 
Then = V{^post(^) \ n = l — 0- 

4 Closing the System 

The analysis marks instances of variables and timers potentially influenced by the 
chaotic environment. Based on this information, we transform the given system into 
a closed one, which shows more behaviour than the original. Since for model check- 
ing, we cannot live with the infinity of data injected from outside by the chaotic en- 
vironment, we abstract this infinity into one single abstract value T. For chaotically 
influenced timer values, we will need a more refined abstraction using 3 different val- 
ues (cf. Section ED- Since the abstract system is still open, we close it in a second 
step, also implementing the abstract values by concrete ones (cf. Section 14.211 . With the 
chaotic environment embedded into the now closed system, we remove, as optimiza- 
tion for model checking, external signals from the input queues. Special care is taken to 
properly embed the chaotic behaviour wrt. the timed behaviour. 



4.1 Abstracting Data 

As mentioned, we extend the data domains each by an additional value T, representing 
data received from the outside, i.e., we assume now domains such as = N U {T }, 
Boof^ — Bool U {T}, . . . , where we do not distinguish notationally the various types 
of chaotic values. These values T are considered as the largest values, i.e., we introduce 
< as the smallest reflexive relation with n < T for all elements v (separately for each 
domain). The strict lifting of a valuation 77 ^ to expressions is denoted by |.],;T, i.e., 
|e]^T = T, if e contains a variable x such that p^{x) = T. 

The step semantics is given (as before) by the rules of Tables Q] and 0 except the 
following differences. T -valued guards behave as evaluating to true, i.e., they are re- 
placed by (|g]^T = true) V = T). For the TIMEOUT- and the TDiscARD- 

rule, the premise concerning the timer remains |f] = on(0). The RECElVE-rule is 

replaced by I-ReceiveInt and 1 -ReceiveExt for internal and external reception, 
where the first one equals the old RECEIVE when s ^ Sig and the latter postulates 
(l,r],q) ^p?(s,T) :: (s,T)) whens S To distinguish notationally the 

original system and its constituents from the intermediate one of this section, we write 

for an intermediate-level process, for an intermediate level system, etc. 

The interpretation of timer variables on the extended domain requires special atten- 
tion. Chaos can influence timers only via the set-operation by setting it to a chaotic 
value in the on-state. Therefore, the domain of timer values contains as additional 
chaotic value on(T). Since we need the intermediate system to show at least the be- 
haviour of the original one, we must provide proper treatment of the rules involving 
on(T), i.e., the TIMEOUT-, the TDiscard-, and the TiCK-rule. As on(T) stands for 
any value of active timers, it must cover the cases where timeouts and timer-discards 
are enabled (because of on(0)) as well as disabled (because of on{n) with n > 1). 
The second one is necessary, since the enabledness of the tick steps depends on the 
disabledness of timeouts and timer discards via the blocked-condition. 
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off 



on(T) 




Fig. 2. Timer Abstraction. 



To distinguish the two cases, we introduce a refined abstract value ort(T'*') for 
chaotic timers, representing all on-settings larger or equal 1. The non-deterministic 
choice between the two alternatives — zero and non-zero — is captured by the rules 
of Table 0 The order on the domain of timer values is given as smallest reflexive order 
relation such that on (0) < on(T) and on(n) < on(T+) < on(T), for all n > l.The 
decreasing operation needed in the TiCK-rule is defined in extension to the definition 
on values from on(N) on T+ by on(T+) — 1 = on(T). Note that the operation is left 
undefined on T, which is justified by a property analogous to Lemma [I] 

Lemma 4. Let {I, be a configuration of S'^ .If {I, ,q^) ^ tick , then \t\^T ^ 

{on(T), on{0)},for all timers t. 

The intermediate system allows to state the soundness of the analysis: whenever a 
variable at some location contains a T -value, the analysis has marked it by T. 

Theorem 5 (Soundness). Given a system S, its intermediate representation 5”^, and 
77 “ as the result of the analysis. Assume — >* 7 ^ = ,q^), where is 

the initial configuration of . Then |a;],;T = T implies |a;]^p = T, and |f],;T G 
{on(T), on(T"'')} implies = T. 

Next we make explicit the notion of simulation we will use to prove soundness of 
the abstraction. The new rule 1 -NonZero introduces additional r-steps in the inter- 



Table 4. Non-determinism for on(T). 



Wr,T = on(T) 



^ ^gt O reset t I € Edg [i],,T = on(T) 
{l,n^ ,<l) off],q) 



I-NonZero 



I-Zero-Timeout 



I G Edg ^ af gtC> reset t) = on(T) 



iW,q^) 



{l,ri^[t^off],q^) 



I-Zero-TDiscard 
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mediate system not present in the original behaviour. Hence, the simulation definition 
must honour additional r-steps of preceding a ticfc-step. 

Definition 6 (Simulation). Given two processes P and P' with sets of configurations 
P and P' . Assume further a relation < C P x P' on configurations and a relation 
< C Lah X Lah on labels, denoted by the same symbol. A relation R C P x P' is 
a simulation if R <, and if (y R y' and 7 — y) implies one of the following 
conditions: 

1 . If \ tick, then y' y' and y R y', for some configuration y' and for some 
label X' > A. 

2 . If X = T, then 7 R 7'. 

3 . If X = tick, then y' = y^ — 7J 7^ -^tick y' for some n > 0 such 

that y Ry' and y Ry[ for all 7'. 

We write P ^ P', if there exists a simulation relation R such that yinit R y'init fat 
the initial configurations yinit ond y^nit of P resp. P' . The definition of simulation is 
analogously used for systems. 

The simulation dehnition is given relative to order relations on configurations and 
on labels. To establish simulation concretely between S and we define (in abuse 
of notation) for labels < C Lab x Lab ^ as the smallest relation such that t < t, and 
that V < implies P?{s,v) < P?(s,u^) as well as Pl{s,v) < Pl{s,v~^). We use 
the same symbol for the pointwise extension of < to compare valuations, states, pairs 
(s, v) < (s, u^), queues, and hnally conhgurations. 

Lemma 7 . Let S and as well as < be defined as above. Then S . 

Proof sketch. It is straightforward to check on the rules of Table [I] that for single pro- 
cesses P P^. For systems, prove the implication that Pi ^ P^ and P2 ^ P^ 
implies Pi || P2 ^ P^ || P'^ , proceeding similarly by case analysis on the rules of 
Tabled There, for the case of Tick, use the fact that 7 < 7^ and blocked (y) implies 
t'"" = 7(]'^ 7 :^ —^T ■ ■ ■ yj = 7''" with blocked{y~^) for some n > 0 and 7'"^, 

and where furthermore 7 < 7^^ for all y^ . □ 



Lemma 8. Let systems S and and the relations < be defined as above. Then for all 
formulas p from next-free LTL, S ^ and |= p implies S \= p. 

4.2 Transformation 

Based on the result of the analysis, we transform the given system S into an optimized 
one — we denote it by — which is closed, which does not use the value T, and 
which is in simulation relation with the original system. 

In first approximation, the idea of the transformation is simple: just eliminate actions 
whose effect, judging from the results of the analysis, cannot be relied on. The trans- 
formation is given for each of the syntactic constructs by the rules of Table El where 
we denote a do-nothing statement by skip. The set of variables Var ** for S'^ equals the 
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Table 5. Transformed System. 



I ^3 > f £ Edg^ lejr,r / T g» = Iff], 
^ o x~e ^ ^ Edg'^ 

I > .: = e 1 £ Edg^ ^ T = Iff], 

^ >g# > sfcip ^ ^ 

I ^?s(x) ^ ^ Edg s ^ Sig 
I ^ 7 s(x) ^ ^ Edg 
I — >?s(a:) I e Edg^ s € S'ig'g 



■ T-ASSIGNi 



■ T-ASSIGN 2 



T- Input 1 



T-INPUT 2 



■ T-NoInput 



^ ^gtp [> reset tp ^ set £p ;— 0 ^ ^ Edg 

^ ^gtp [> reset tp ^set £p; — 1 ^ tz Edg 

I — >9I> P'!(a,e) i e S ^ Sig^^t 5 “ = blpf 



^ >9# O P'!(s,e) ^ ^ Edg 



■ T-OUTPUTi 



^ >9> P'!(«,e) ^ £ gdff S £ ff" = IfflpC 

^ ^9# > skip ^ ^ 

I — >9> t:=e i € Edg^ 3* = Igjr,^ [e]pp / T 



■ T-OUTPUT 2 



> set t: = e 



i e Sdf?" 



I ^9 [> set t-. — e i t= Edg — [l/lpp 

^ ^9# > set t ~0 ^ ^ 

^ >9l> reset t 1 & Edg = Mpf 

^ ^9# > reset t ^ ^ Edg'^ 

I >9t O reset t I €. Edg gf ~ | 3 t]rjj= 



T-SeTi 



T-SET2 



T-Reset 



M.f = T 

^ tgt O reset t ^set £: — 1 ^ ^ Edg 



■ T-Timeout 



■ T-NoTimeout 



original Tor, except that for each process P of the system, a fresh timer-variable t p is 
added to its local variables, i.e., VoPp = Varp U {tp}. 

The transformation rules embed the chaotic environment’s behaviour into a system. 
We start with the part not interacting with the environment, i.e., the transformation con- 
cerning the manipulation of variables and timers. Variable assignments are either left 
untouched or replaced by skip, depending on the result of the analysis concerning the 
left-hand value of the assignement (rules T-AssiGNi and T-ASSIGN 2 ). A non-timer 
guard g at a location I is replaced by true, if |(/] r)f = T; if not, the guard stays 
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unchanged for the transformed system. We use g'^ = \g\r)f as shorthand for this re- 
placement in the rules. For chaotic timers, we represent the abstract values on(T) and 
on(T"*') of the intermediate system by the concrete on(0) and on(l), respectively, 
and directly incorporate the I-NONZERO-step of Table 0by the transformation rule 
T-NoTimeout. 

For communication statements, we distinguish between signals going to or coming 
from the environment, and those exchanged within the system. Output to the outside ba- 
sically is skipped (cf. rules T-OutpuTi and T-OUTPUT 2 ). Input from outside is treated 
similarly. However, just replacing input by unconditionally enabled sfcip -actions would 
be unsound, because it renders potential fzcfc-steps impossible by ignoring the situation 
when the chaotic environment does not send any message. The core of the problem 
is that with the timed semantics, a chaotic environment not just sends streams of mes- 
sages, but “chaotically timed” message streams, i.e., with tick’s interspersed at arbitrary 
points. 

We embed the chaotic nature of the environment by adding to each process P a new 
timer variable tp, used to guard the input from outsidej^ These timers behave in the 
same manner as the old “chaotic” timers, except that we do not allow the new t p timers 
to become deactivated (cf. rules T-INPUT 2 and T-NoInput). Since for both input and 
output, the communication statement using an external signal is replaced by a skip, the 
transformation yields a closed system. 

The relationship between the intermediate and the transformed program will again 
be based on simulation (cf. Definition 01 but with different choices for the order rela- 
tions on configurations and on labels. Based on the dataflow analysis, the transformation 
considers certain variable instances as potentially chaotic and unreliable. Hence to com- 
pare configurations of and S'^, we have to take ri°‘ into account. So relative to a given 
analysis 77 “, we define the relationship between valuations as follows: \= rf^ < gK 

iff for all variables x G Var one of the two conditions hold: |a:]^T = |a;]^ii or 
\x\r)f = T. Note that nothing is required for the new timer variables t p. 

The set of observable input signals of a process P is defined as 

obs = {s G Sig\Sig \ —'31 *'g>P!(s,e) = T}. 

The observable effect of input and output labels is given by the following equations: 

(t if s G Sig^^t (t if s G Sig^^t 

^Pl{s,vy = Pl{s,v) if s G Sigy^,^ ^P\{s,vy = P!(s,i;) if s G Sig^^, 

[P?(s) else [ P!(s) else 

For tick- and r-labels, acts as identity. With this definition, we choose as order 
relation on labels < A** if '"A^”' = '~A**^. In accordance with this definition, we 
set < on the input queues of P^ and P'^ inductively as follows: for empty queues 
e < e. In the induction case (s, < qK if s G Sig^^f. and q~^ < q^ Otherwise, 

(s, :: < (s, n**) :: qK if q^ < q'^ and furthermore '~(s, = '~(s, where 

for queue messages is defined in analogy to the definition for labels. This means, 
when comparing the queues, the external messages are ignored for P •*, while for the in- 
ternal messages, the signals must coincide and the value component is compared on the 

^ Note that the action gtp reset tp\ set tp := 0 in mle T-INPUT 2 corresponds to the do- 
nothing step gtp skip. 
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result of the analysis on the potential sending locations. The <-definitions are extended 
in the obvious manner to expressions and configurations. 

In order to have the transformed system exhibit only more behaviour than the inter- 
mediate one, it must be guaranteed that whenever a guarded edge can be taken in S' 
the corresponding guard for likewise evaluates to true, where we have to take into 
account that in the intermediate level, guards with value T enable the action, as well. 
This property is an immediate consequence of the construction of the guards g'^ in S**. 

Lemma 9. Assume two systems S^ and S'^ and 77 “ |= {1, 77 ^) < {1, 77 **). 

1. Let g be a guard of an edge in S^ originating at location I and g^ its analogue in 

SK Iflgl^T € {true,T}, then = true. 

2. |f]^T C {on(0), 07 i(T)}, then |t]^# = oti(O). 

Lemma 10. Letj^ = {1, 77 ^, q^) be a configuration of . Then there exists an input- 
edge starting from I, or an edge guarded by g and where [g] = true or = T. 

Lemma 11. Let 7 ^ and 7 ^ be two configurations ofS^ and S^, such that 1 = ^ 

7 **. If blocked (y^), then 7 ^ = 7 q 7 J — 7 ^ = for some configurations 
and some n > 0 such that 7 ''" < 7 ^ for all i, and blocked ( 7 ^). 

Lemma 12. Let and as well as < be defined as above. Then SK 

Proof sketch. With the help of Lemma 0 it is straightforward to check on the rules of 
Tables Gland Altogether with the transformation rules, that for single processes f. 
P®. For systems, prove the implication that P[ and Pj^ f P| implies P^ || 

PT ^ p« II pI 

, proceeding similarly by case analysis on the rules of Table El There, 
for the case of Tick, use T.emma fTTl □ 

Having established simulation between the two levels, we can proceed to the rela- 
tionship we are really interested in, namely: the transformed systems must be a safe 
abstraction as far as the logic is concerned. Being in simulation relation guarantees 
preservation of LTL-properties as long as variables influenced by chaos are not men- 
tioned. Therefore, we define as set of observable variables Var obs = | ^3/ S 

Loc. |a;]^p = T}. Note that the additional timer variables t p are unobservable. 

Lemma 13. Let the relations < and Var obs be defined as above. Then for all formulas 
ip from next-free LTL, S'^ and \= p implies \= p. 

This brings us to the paper’s final result: as immediate consequence of the above 
development, we obtain the desired property preservation: 

Corollary 14. Let S, S^, and Var obs be defined as before, and p a next-free LTL- 
formula mentioning only variables from Var obs- Then S'^ is closed and S'^ \= p implies 

S \=p. 
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5 Conclusion 

In this paper, we apply dataflow analysis to transform an open system into a closed, safe 
abstraction, well-suited for model checking. The method of embedding chaos has been 
successfully applied in the context of the Vires project (Verifying /ndustrial fJeactive 
Systems) E3- To cope with the complexity of the project’s verification case study, an 
industrial wireless ATM medium-access layer protocol (Mascara) [flSEi, we followed 
a compositional approach, which immediately incurred the problem of closing the mod- 
ules [L39I40I . 

Related Work. Closing open (sub-)systems is common for software testing. In this held, 
a work close to ours in spirit and techniques is the one of [HI. It describes a dataflow 
algorithm to close program fragments given in the C-language with the most general 
environment and at the same time eliminating the external interface. The algorithm is 
incorporated into the VeriSoft tool. Similar to the work presented here, they assume 
an asynchronous communicating model, but do not consider timed systems and their 
abstraction. Similarly, m consider partial (i.e., open) systems which are transformed 
into closed ones. To enhance the precision of the abstraction, their approach allows 
to close the system by an external environment more specific than the most general, 
chaotic one, where the closing environment can be built to conform to given assump- 
tions, which they call filtering ESI. As in our work, they use LTL as temporal logic and 
Spin as model checker, but the environment is modelled separately and is not embedded 
into the system. 

A more fundamental approach to model checking open systems, also called reactive 
modules Wi, is known as module checking irnnoDi. Instead of transforming the sys- 
tem into a closed one, the underlying computational model is generalized to distinguish 
between transitions under control of the module and those driven by the environment. 
Mocha HD is a model checker for reactive modules, which uses alternating-time tem- 
poral logic 01 as specification language. 

Slicing, a well-known program analysis technique, resembles the analysis described 
in this paper, in that it is a data-flow analysis computing — in forward or backward 
direction — parts of the program that may depend on the certain points of interest (cf. 
for a survey ED). The analysis of Section 0 computes in a forward manner the cone 
of influence of all points of the system influenced from the outside. The usefulness of 
slicing for model checking is explored in where slicing is used to speed up model 
checking and simulation for programs in Promela, Spin’s input language. However, 
the program transformation in E3 is not intended to preserve program properties in 
general. Likewise in the context of LTL model checking, [HI use slicing to cut away 
irrelevant program fragments but the transformation yields a safe, property-preserving 
abstraction and potentially a smaller state space. 

Future Work. While chaos is useful as the most abstract approximation of the envi- 
ronment, one often can verify properties of a component only under assumptions or 
restrictions on the environment behaviour. For future work we plan to generalize the 
framework to embed also environments given by timed LTL-formulas. For timers, a 
more concrete behaviour than just using random expiration periods could be automati- 
cally extracted from the sub-components by data-flow techniques, leading to more re- 
fined timer abstraction. 
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Abstract. We address the problem of verifying safety and liveness prop- 
erties for infinite-state systems, using symbolic reachability analysis. The 
models we consider are fair parametric extended automata, i.e., counter 
automata with parametric guards, supplied with fairness conditions on 
their transitions. In previous work, we shown that symbolic reachabil- 
ity analysis using acceleration techniques can be used to generate finite 
abstractions (symbolic graphs) of the original infinite-state model. In 
this paper, we show that this analysis can be also used to introduce fair- 
ness conditions on the generated abstract model allowing to model-check 
liveness properties. We show first how to translate faithfully the fairness 
conditions of the infinite-state original model to conditions on the gen- 
erated finite symbolic graph. Then, we show that we can also synthesize 
automatically new fairness conditions allowing to eliminate infinite paths 
in the symbolic graph which do not correspond to valid behaviours in the 
original model. These infinite paths correspond to abstractions of bound- 
edly iterable (nested) loops. We show techniques allowing to decide this 
bonnded iterability for a class of components in the symbolic graph. We 
illustrate the application of these techniques to nontrivial examples. 



1 Introduction 

Symbolic reachability analysis is a powerful paradigm used in the verification 
of infinite-state systems, such as extended automata, communicating automata 
through unbounded queues, parameterized systems, etc. It consists in using finite 
structures to represent infinite sets of configurations, and iterative exploration 
procedures to compute (a finite representation of) the set of all reachable con- 
figurations, or an upper approximation of this set. To help termination, and in 
some cases to force it, these procedures are often enhanced by acceleration tech- 
niques which allow to compute in one step the effect of sequences of transitions 
instead of one single transition in the system. Typically, an acceleration step cor- 
responds to the computation of (an upper approximation of) the set of reachable 
configurations by iterating an arbitrary number of times some sequence of tran- 
sitions (a circuit in the control graph of the system). For instance, starting from 
an initial value a; = 0, the iteration of a transition which increments the variable 
a: by 2 leads to the set of configurations {0, 2,4, . . .} which can be represented 
by the constraint x = 2n, with n > 0. Acceleration techniques allow to compute 
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this finite representation in one step instead of computing the infinite sequence 
of approximations {0}, {0, 2}, {0, 2, 4}, . . . 

The construction of the set of reachable configurations allows to check (on- 
the-fiy) safety properties. More interestingly, this construction can also be used 
to generate finite abstractions of the analyzed model on which standard finite 
model-checking procedures can be applied. Indeed, given any finite partition of 
the set of configurations, it is possible to construct a finite symbolic graph (the 
quotient graph according to this partition) which simulates the original system. 
In general, the exploration procedures used to generate the reachability set pro- 
duce a finite symbolic graph, when they terminate. The nodes of this graph are 
symbolic configurations (sets of configurations) and the edges correspond to the 
application of transitions of the system or acceleration steps (meta-transitions) . 
Since the symbolic graph simulates the analyzed system, if it satisfies a property 
then the original system also satisfies it (we suppose here that properties involve 
only universal path quantification, e.g., linear-time properties). 

In previous works, we have shown that this approach can be applied to verify 
fully automatically safety properties of infinite-state models such as communi- 
cating automata with unbounded queues [Q , and parametric extended automata 
(automata with clocks and counters) jS]. However, while very successful in the 
case of safety properties, this approach fails in general when applied to the verifi- 
cation of liveness properties. This is due to the fact that, almost always, liveness 
properties only hold under fairness conditions saying that infinite iterations of 
certain transitions are not allowed. Therefore, the question we address is, given 
an infinite-state model with fairness conditions, how to introduce automatically 
fairness conditions on the finite abstract model generated by means of symbolic 
reachability analysis, in order to be able to model-check liveness properties. 

In fact, the fairness conditions that must be introduced on the symbolic 
graph are of two sorts. First, the original infinite-state model may have fairness 
conditions expressed on its transitions (in its infinite configuration graph) . These 
conditions must be translated faithfully on the transitions of the finite abstract 
model (symbolic configuration graph). Moreover, additional fairness conditions 
must be inferred since it is often the case that the finite abstract model has 
loops that do not correspond to a possible infinite execution path in the original 
model. For instance, consider a transition 9 which increments a variable x by 
2 if a; < M, where M is a positive integer parameter. The set of reachable 
configurations from the initial value a; = 0, after executing 9 at least once is: 

X = 2(n -|-l)An>0A2n<M 

Our exploration procedure (with acceleration) will compute this set. It will also 
produce a symbolic configuration graph having a loop on the set given above 
since this set is closed under the application of 9. However, it is clear that the 
iteration of 9 is bounded due to the guard x < M. The problem is to infer 
automatically this fact and to express it as a fairness constraint on the symbolic 
graph. This problem can be more general than the one illustrated by this example 
and may concern a whole component of the symbolic graph, corresponding to 
nested loops and not only to a simple one. However, this problem is clearly as 
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hard as deciding termination of programs (halting problem of Turing machines) 
and cannot be solved automatically in general. 

The question of extending symbolic reachability techniques in order to gen- 
erate finite abstractions with fairness conditions can be addressed in a similar 
manner for several kinds of infinite-state models. To illustrate our approach, we 
consider in this paper only numerical models which are fair parametric extended 
automata, i.e., automata with integer counters that can be reset, incremented, 
and compared against parameters. The fairness conditions are expressed on the 
transitions of the model under constraints on the source configurations of these 
transitions. Constraints on these source configurations are expressed using pred- 
icates on the variables of the system. 

We show that our reachability analysis procedures and the acceleration tech- 
niques we developed in ^ can be exploited and adapted in order to deal with 
fairness conditions. 

First, we show how the translation of fairness conditions from the original 
model to the abstract one can be done using symbolic reachability analysis. The 
idea is simple and consists in considering an extended model with additional 
boolean variables corresponding to the predicates involved in the fairness con- 
ditions. This forces our symbolic analysis procedure to split the abstract model 
according to these predicates. This makes easy the translation of the fairness 
conditions on the transitions of the so obtained symbolic graph, and ensures the 
soundness of this translation. 

Then, we consider the harder problem of inferring new fairness conditions 
corresponding to bounded iterability of loops, and more generally to bounded 
iterability of some kinds of connected components (nested loops). The main 
intuitive idea we develop is that, to perform an acceleration step, our techniques 
are based on an analysis of the transitions in order to compute the effect of 
their iterations. From this analysis, we can actually extract informations about 
the bounded/unbounded iterability of these transitions. If we consider again the 
example mentioned above, we can determine automatically that the transition 6 
adds to X the value 2 each time it is executed. Then, from this fact, we first infer 
that, after executing n + 1 times the transition 9 starting from a; = 0, we get 
the value x = 2{n+ 1), and we know also that 2n < M should be true since the 
guard must be satisfied. This is basically what our acceleration techniques do 
in order to compute the set of reachable configurations. Now, it is also possible 
to infer from the fact above that 9 can only be iterated a finite number of 
times. This can be done by deciding whether there exists an n such that the 
obtained value after n iterations of 9 (i.e., 0-1- 2n) falsifies the guard, that is, 
an n such that 2n > M. This is obviously true (in this example), which allows 
to infer the boundedness of the loop. We show that this basic idea allows to 
decide the bounded iterability of any elementary circuit in the symbolic graph. 
Moreover, we show that this result can be extended to some classes of nested 
loops (connected components), by showing that the analysis of the iterability of 
these components can be reduced to the analysis of simple loops. 

The same approach can also be applied to timed models (with real-valued 
clocks) under some restrictions, essentially time-determinism (transitions in the 
analyzed loops must take fixed time values which may be parameterized). 
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We show that our techniques can be applied to many nontrivial examples. In 
particular, we consider a parametric timed model of the Bounded Retransmission 
Protocol (with unbounded counters and clocks, parametric values of the number 
of retransmissions, and of the timeouts) , and show that our techniques generate 
automatically a finite abstraction of the system and fairness constraints which 
allow to model-check the liveness properties of the protocol, in addition to its 
safety properties. Other examples are considered to illustrate the case where the 
ability to analyze nested loops is needed. 



Related Work: There exist many work on the symbolic reachability analysis 
of infinite-state systems using acceleration techniques j8IYltl3l21h) . These works 
consider mainly the case of safety properties. In UnEHl, techniques for auto- 
matic verification of liveness properties are proposed in the framework of regular 
model checking, for the particular case of length preserving transformations on 
sequences, e.g., actions of parameterized networks of processes connected sequen- 
tially. In |S|, techniques for translating fairness conditions from a concrete to a 
finite abstract model, as well as a procedure for computing safe fairness condi- 
tions on the abstract model are proposed in the particular case of parameterized 
sequential networks of processes. 



There exist many works on proving liveness properties, and in particular the 
termination of programs mcni. These works do not propose fully automatic 
techniques, but are based on a general proof principle which consists in finding a 
ranking function on configurations which decreases along each execution path ac- 
cording to a suitable well-founded ordering. Few recent works try to automatize 
this principle and propose heuristics for the generation of such ranking func- 
tions in some restricted cases 1 121 1 I j . These works reason about the structure of 
the original model, whereas our techniques are applied on the structure of the 
symbolic graph generated by the reachability analysis. This fact actually is im- 
portant. Indeed, in our symbolic graph, according to our acceleration techniques, 
loops in the original model are unfolded until we find sequences of transitions 
that have a periodic effect. This unfolding helps for the computation of the effect 
of iterated transitions but also in the reasoning about the boundedness of these 
iterations. It turns out that our techniques are powerful enough to deal with 
complex cases for which, if we reason directly on the original model, it is neces- 
sary to define a ranking function which decreases according to a lexicographic 
ordering on the variables, which is hard to guess automatically. 

Outline: In Section 21 we give some basic definitions and introduce the kind 
of constraints and operations used in our models. In Section 0 we introduce 
the model of fair parametric extended automata and its semantics. In Section 0 
we recall from 0 the symbolic structures, the reachability algorithm, and the 
extrapolation techniques. In Section 0 we show how the concrete fairness con- 
ditions may be translated into fairness constraints at the abstract level. In Sec- 
tion 0 we present the conditions needed and the methods used in order to syn- 
thesize fairness conditions for simple loops of the symbolic graph. In Section 0 
we show how the results for simple loops can be extended to nested loops. 
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2 Preliminaries 

Let A” be a set of variables and let x range over X . The set of arithmetical terms 
over X, denoted AT{X), is defined by the grammar: 

t ■.:= D \1\ X \t — t \ t + t \ t *t 

The set of first-order arithmetical formulas over X, denoted FO{X), is de- 
fined by the grammar: 



(j) ::= t < t \ ->(/) I ^ V ^ I 3a;. \ IsJnt{t) 

Formulas are interpreted over the set of reals. The predicate IsSnt expresses 
the fact that a term has an integer value. The fragment of FO{X) of formulas 
without the IsJnt predicate is called the first-order arithmetics of reals and 
denoted RFO{X). The fragment of FO{X) of formulas without multiplication 
(*) is called the linear arithmetics and is denoted LFO{X). It is well-known that 
the problem of satisfiability in FO{X) is undecidable, whereas it is decidable for 
both fragments RFO{X) and LFO{X). 

Let 7^ be a set of parameters. Parameters can be seen as variables that are 
not modified by the system (they keep their initial values all the time). Then, a 
simple parametric constraint is a conjunction of formulas of the form a; A t or 
x — y^t, where x,y G X, {<, <}, and t G AT{V). We denote by SC{X,V) 
the set of simple parametric constraints. 

We consider simple operations on variables corresponding to sets of special 
kinds of assignments. We allow assignments of variables that are either of the 
form a; := y -|- 1 or of the form x := t, where x,y G X are variables (x and y may 
be the same variable), and t G AT{V). 

Let X = (a;i,a; 2 , . . . ,a;„) be the vector of all variables under consideration. 
Simple operations correspond to assignements of the form x := Ax -\- b where 
6 is a n-dim vector of terms, and A is a n x n {0, l}-matrix such that each 
row has at most one non-zero (1) value. Let us call simple matrices such n x n 
{0, l}-matrices. 



3 Fair Parametric Extended Automata 

We consider in our work models which are automata supplied with integer coun- 
ters and real valued clocks. Counters and clocks can be reset using simple op- 
erations, and they can tested using simple parametric constraints (they can be 
compared to parameters in these constraints). Moreover, we consider fairness 
conditions on the transitions of these models. To simplify the presentation, we 
focus here on the case where only counters are considered. A Parametric Ex- 
tended Automaton (PEA) is a tuple T = {Q, X, P, 6) where: 

— Q is a finite set of control states, 

— A is a finite set of counters ranging over the set positive integers IN, 

— P is a finite set of parameters ranging over positive integers. 
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— (5 is a finite set of guarded actions of the form {qi,g{X,P),sop,q2) where 
9 ij<72 G Q, g{X,P) € SC{X,P) is a guard, and sop is a simple operation 
over X. 

A configuration of T is a triplet 7 = {q, p, tt) where g is a control state, 
p and 7T are respectively valuations of the counters and parameters. For each 
guarded action t = {qi,g,sop,q2) € i5 we define a transition relation between 
configurations: (gi, /ii, tti) {g2,P2i'^2) iff tti = 7T2, g{p\, t^i) is true, and 
P2 = sop(pi). Then, a computation sequence is a sequence 7 oTo7iTi 72 . . . such 
that, for every i > 0, 7^ li+i, and either it is infinite, or it is finite but 
maximal (cannot be prolonged). 

We introduce constrained fairness conditions on PEAs expressed on their 
transitions. Let a constrained action be a pair (t, f{X, P)) where r is a guarded 
action and f{X, P) G FO(X, P) is a formula expressing a constraint on the 
sources of r. A Fair PEA is a tuple {Q, X, P,S,W,B), where (Q,X,P,S) is a 
PEA, W and B are respectively set and set of sets of constrained actions. 

The set W (called justice set) allows to express weak fairness conditions 
saying that whenever an action is permanently enabled, it must be executed 
sometime in the future (hence, it will be executed infinitely often). The set B 
(called boundedness set) allows to express hounded iterability conditions saying 
that, for each U G B, if all actions in U are taken infinitely often, then it must 
also be the case for some action not in U. Formally, using the notations for 
“there exists infinitely many”, and T(U) for the set of transitions appearing in 
U , a computation sequence cr = 7 oTo7iTi 72 ... is fair iff: 

~ V(r, /) G W. Vf > 0, if Vj > i. jj \= f and 3y. 7^ 7 then 3k > i. Tk = r, 

~ VC/ G B, if V(t, /) G U. 3^j. (tj- = T and 7^ |= /), then 3t' ^ T{U). 3^k. Tk = 
t'. 

The following example shows how constrained fairness conditions can be used 
to express non trivial fairness conditions. 





tq: t = i/t~n 
x:=x — 1 

Fair PEA for the network 

A II • • • II Pn 



Fig. 1. Example for Constrained Fairness. 
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Example 1. Consider the parameterized network P\ || • • • || where each pro- 
cess Pi is described in Figured At control state go, the processes are competing 
for getting a resource. Process Pi has the resource when t = i and it moves to 
control state qi when it does not need the resource anymore. A process of smaller 
index can, however, preempt a process of bigger index number. It is natural to 
assume that a process that has the resource and is never preempted, eventually 
reaches control state qi . 

We are interested in verifying that eventually all processes reach gi, under 
the assumption that all transitions are weak fair. The parameterized system can 
be modeled by a fair parametric extended automaton as shown in Figure d In 
this automaton, x counts the processes in control state go. The property we want 
then to check is whether eventually x = 0, i.e. Ox = 0. 

Clearly this property is not satisfied, if we do not assume any fairness on 
the transition tq decrementing x. Assuming that this transition is weakly fair 
is, however, too strong because if we remove the guard t > i in Pi, the system 
El II • • • II Pn will not satisfy the property of interest but still the fair extended 
automaton does. In fact, what we would like to express is: if the value of t 
remains always unchanged, then eventually a transition decrementing x is taken. 
To express this property we add a variable to (for t old) to the abstract system, 
and add the assignment to := t to the transition ti. Then, the constrained weak 
fairness condition (ro,fo = t) expresses the property we want. 



4 Symbolic Reachability Analysis 

We present now the main principles of the symbolic reachability analysis tech- 
niques developed in . For a PEA T, these techniques build (when they termi- 
nate) a symbolic reachability graph SG{T) = (F, E) where F is a finite set of 
structures representing (infinite) sets of configurations of T and if C F x F is a 
finite set of transitions between structures in F, corresponding to transitions of 
T. The fairness conditions are defined similarly to Fair PEA. The constrained 
edges of the symbolic graph are pairs (e, b) with e G E and 6 is a boolean ex- 
pression (in SC{X,V)) on the source configuration of e. A Fair SG{T) is a tuple 
(F E, WsG, Esg) where Wsg (the justice set) is a set of constrained edges and 
Bsg (the boundedness set) is a set of sets of constrained edges. 



4.1 Symbolic Representation Structures 

In order to represent sets of configurations of PEAs, we use Constrained Para- 
metric Difference Bound Matrices, introduced in 0j, which is an extension of the 
Difference Bound Matrices (DBM) UBI used for the representation of reachability 
sets of (non parametric) timed automata Pj. 

Definition 1. Let T = {Q, X, P, i5) be a PEA, let X = {a;i, ..., Xn} be the set of 
counters of the PEA T , and let xq be an additional counter whose value is always 
equal to 0. A parametric difference bound matrix (PDBM) which represents 
a set of configurations over X js a (n -|- 1) x (n -|- 1) matrix M of elements 
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in AT{P) X {<,<}. Each entry of M , M[i,j] = (i, a) encodes the constraint 
Xi — Xj A t. Given a valuation tt of parameters, the semantics of M is defined 
by Wh = l/\ijM{i,j)j^. 

A constrained PDBM, is a pair S = (M, cj>) where M is a PDBM and (j) 
is a constraint in FO{P). Given a valuation it of parameters, the semantics of 
{M,4>) is defined by |(M,(/))] = {(/r,7r) \ pG [M]^,7r e |^]}. 

The extrapolation procedure |E| used to help termination of the reachability 
analysis introduces new parameters, called iteration parameters. These param- 
eters correspond to numbers of iterations of control loops. Let N be the set 
of these new parameters. The definition of constrained PDBM is extended to 
represent sets of configurations represented by means of variables in N. 

Definition 2. An open PDBM is a PDBM which elements are terms in AT{PG> 
N) X {<,<}■ An open constrained PDBM is a constrained PDBM (M, (p) where 
M is a open PDBM and p is a constraint in FO{P U N). 

The semantics of open PDBM and open constrained PDBM are defined sim- 
ilarly, by considering also valuations v of iteration parameters in N . Given an 
open PDBM M (resp. constrained PDBM S = {M,(p)), we denote by Iter{M) 
(resp. Iter{S)) the set of iteration variables appearing in M (resp. in M or p). 

Definition 3. A symbolic configuration P is a pair (q, S) where q G Q is a 
control state, and S is a constrained PDBM. A symbolic configuration P = {q, S) 
includes P' = {q',S') (notation P A P') if q — q' and [S'] D |S"]. 

0 shows how the standard operations on DBM (transformation into a canon- 
ical form, intersection, and inclusion test) can be lifted to (open) constrained 
PDBM. We have implemented in C-I--I- a package providing data structures and 
operations for manipulating (open) constrained PDBM over integer or real vari- 
ables. This package is distributed with the tool TRe5(0. 

The operations on (open) constrained PDBM manipulate arithmetical terms 
in AT{P) and boolean formulas over comparisons of the form t A t' , with 
t, t' G AT{P) and AS {<, <}. We have implemented in TReX a package allow- 
ing to represent in a compact way these terms and formulas, using arithmetical 
and logical simplification rules (symmetry of arithmetical operations, factoriza- 
tion, properties of boolean connectives, etc.). Our representation also includes 
information about the nature of terms (real/integer, linear/non-linear) and for- 
mulas (in LFO{X, P) or RFO{X, P)). This allows to invoke the suitable decision 
procedures: Omega im in the integer linear case, or Reduce cni otherwise. 



4.2 Building Symbolic Reachability Graphs 

Let T be a PEA. The procedure proposed in computes, given an initial 
symbolic configuration Pq, the symbolic reachability graph SG(ff) representing 
the set post*{Po). For that, starting from Pq, the transitions of T are applied. 



^ http://www.liafa.jussieu.fr/~sighirea/trex and 
http : / /www-verimag. imag . fr/~aimichin/trex 
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when possible, to build new vertices in SG(T). For a transition t of T and 
a symbolic configuration F, we denote by post^(F) the vertex computed by 
applying r to F. The new vertices of SG{T) are treated according to a depth- 
first traversal. The construction stops when each new generated vertex is covered 
by (included in) some already computed vertex. 

During this construction, an extrapolation technique is used to help termina- 
tion. The extrapolation technique defined in ^ is based on guessing automati- 
cally the effect of iterating an arbitrary number of times a control loop (cycle in 
the control graph of T) starting from a given symbolic configuration, and check- 
ing that this guess is exact (does not introduce non reachable configurations). 
We present below the main idea of the extrapolation technique. 

Let 0 be a control loop in T, i.e., a path tq . . .Tk with Vf G {0, . . . , k}, Ti = 
{qi,gi,sopi,q[), such that g = go = <?fc and Vz S {0, . . . , A: - 1}, g' = qt+i. Let 
F = [q^S = (M, (f))) be a symbolic configuration. We note by postg{F) the com- 
putation {post .. .0 post ^^){F). Suppose that we have computed = {Mi^<pi) 
and S 2 = {M 2 , 4>2) such that (g. Si) = postg{q, S) and (g, S' 2 ) = postg{q, Si). Let 
A = Ml — M and A' = M 2 — Mi. If for all values of parameters A — A' , we 
may guess that the symbolic configuration obtained after n iterations of 9 is 
postg{q, S) = (g, Sn = {M + n*A,(j)An> 0)). To check the exactness of this 
guess, we have to check that for all n > 0, the application of 9 on (g, Sn) has 
an effect of adding A, provided the guards in 9 are satisfied. This corresponds 
to the induction step for the proof of exactness. If the extrapolation can be ap- 
plied, postg{q, Sn) is added to the set of vertices V in SG{T), and the transition 
((g, Si),postg{q, Sn)) to the set of edges E. 

5 Translation of Fairness Constraints 

We explain how one can adapt the symbolic reachability analysis presented in 
Section 0 to take into account concrete fairness conditions. To do so, consider a 
set of constrained actions (r^, /,), for i = 1, . . . , n. We associate to each predicate 
fi a boolean variable bi. Then, we extend the symbolic configurations F to 
{F,v), where ^ is a valuation of the hiS such that if v{bi) is true {resp. false), 
then there is concrete configuration in T that satisfies {resp. falsifies) fi. Notice 
that the same symbolic configuration F may be extended to many extended 
configurations. Now, we have to lift the predicate transformers posG and postg 
to extended configurations. Let us first deal with the simple case of post.,.. We 
then define postl as the function that takes an extended configuration {F, v) as 
argument and yields a set postl {{F, v)) of extended configurations as result. The 
set post({{F,v)) is such that the following conditions are satisfied: 

1. If {F',F) G postl {{F,u)), then posG{F) = F', 

2. if {F' , v') ^ postl{{F, v)), then there is no configuration in F that evaluates 

(/i, • • • ,/ri) to F. 



Now, the symbolic reachability graph contains an edge labeled by r from 
{F,v) to all extended configurations in postl{{F,v)). The definition of postg is 
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similar except that in the symbolic reachability graph we will also have edges 
connecting all configurations in postg{{r, v)). 

Now, we can explain how to translate the concrete fairness conditions to 
fairness conditions on the symbolic reachability graph. Thus, assume that (t, /) 
is in the set W of constrained weak fairness conditions. Then, when verifying 
a property on the symbolic reachability graph, we only need to consider paths 
that contain infinitely many r-edges, if from some point on they only contain 
extended configurations where the boolean associated to / evaluates to true. 
In other words, we will have that Wsg contains the constrained edge (er,bf), 
where Cr stands for the edges labeled by t and 6/ is the boolean associated to 
/. Similarly, for a set C/ = {(rj, /i) | 1 < i < n} G B expressing a boundedness 
condition, we will have that {(cri, | 1 < i < n} G Bsg- 

6 Synthesis of Fairness Constraints for Elementary Loops 

We explain how boundedness conditions can be synthesized for the loops of the 
symbolic graph which do not correspond to any legal infinite computation in the 
original fair PEA. 

Let consider the simplest case of a self- loop 6 (i.e., a loop with only one 
transitions). To show that 9 is finitely (boundedly) iterable, the idea is to show 
that for all configurations 70 belonging to the initial symbolic configuration, 
all executions paths resulting from iterating 9 stop. Now, suppose that given an 
arbitrary number n > 0, we can compute the configuration ^”(70) obtained after 
n iterations of 9 starting from 70 Then, deciding that 9 is boundedly iterable 
can be reduced to decide if for all initial configurations 70, there exists an n > 0 
such that the configuration 0"(7o) does not satisfy the guard of 9. 

We will show that, under some conditions, the analysis of complex con- 
nected components of the symbolic graph may be reduced to the above idea 
for analysis of self-loops. For this, we proceed by induction on the complex- 
ity of connected component to be analyzed, C. The simpler case are elemen- 
tary circuits, for which we define the degree of complexity, degree{C) = 1. 
For a non-elementary connected component, its degree is defined inductively 
by degree{C) = maxc'GG{degree{C')) + 1 for all C connected (strict) subcom- 
ponent of C . In this section we consider elementary circuits. 

6.1 Computation of the Effect 

Let 9 = /o'n3ATi/2---Tfe_i/fe be a computation sequence of the symbolic graph 
such that 9 is an elementary circuit, i.e., Fq D and ^0 < i < j < k. Fi ^ Fj. 

Let r = {qi,g{X,P),sop,q 2 ) be a transition, and let sop correspond to the 
assignement x := ArX -f b^- of the counters (including xg) where Ar is (n-l- 1) x 
(n -|- 1) simple matrix, and b^- is a (n -I- l)-vector of terms in AT(V). 

We can lift operations on vectors of counters to operations on PDBMs in 
the following manner. First, we transform the vector 6^- into a (n -I- 1) x (n -I- 1) 
matrix of terms Br defined by: VO < i,j < n. BtIi,]] = bT-{i) — bT-(j). Then, 
applying the assignement x := At-x -|- on a set of configurations described 
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by a PDBM M leads to a PDBM M' = At * M * + Br where is the 

transposed matrix of At . 

In this way, the effect of a transition of the symbolic graph on a PDBM 
is given by a pair of matrices {A, B) G A x B, where A is the class of simple 
matrices, and B is defined by: 



B = {i? I VO < z, j < n. B[i,j]GAT{P\jN) A B[i,j] = —B[j,i] A 

WO < k < n. B[i,k] + B[k,j] = B[i,j]} 

Matrices in A and B have nice properties. Indeed, A is closed under multiplica- 
tion of matrices, and B is closed under addition and closed under multiplication 
with matrices in A. Notice that, matrices in B represent parameterized configu- 
rations of counters (they encode single vectors of terms) . We define the mapping 
c/ : B ^ {AT{P U which maps each matrix in B to the corresponding 

parameterized configuration for counters. 

Using these properties of sets A and B, it is easy to show that the effect of 
a sequence of transitions tqTi... on a PDBM is also given by a pair of matrices 
{A,B) G a X B. Consequently, starting with a PDBM Mq G Pq and iterating 
one time 0 gives a new PDBM 9{Mq) = Ag * Mq * Ag + Bg where the pair 
{Ag,Bg) G A X B cumulates the effect of the sequence of transitions composing 
0. The following proposition states the conditions under which 0'^{Mq) can be 
computed. 

Proposition 1. Let 9 = PQTQPiTiP 2 ---Tk-iPk be an elementary eircuit in a 
symbolic graph such that Pq = {qq, (MqAo)) o.'n-d the effect of the sequence of 
transitions tq, ...,Tk~i is given by matrices (Ag,Bg). If the following PDBM 
Ag = Ag * Mq * Aj + Bg — Mq satisfies the conditions: 

— SC\: Ag * Ag * Aj — Ag and 

— SC 2 : Ag is a closed parameterized configuration for counters, i.e., Ag G B 
and is closed (i.e., does not contain variables in N ) 

then, Wxq G \Mq, ^qI . Vn > 0. 9^{xq) = Xq + u* cf{Ag). 

Both conditions SCi and SC 2 correspond to conjunctions of equalities be- 
tween arithmetical expressions over integers. These formulas are decidable in the 
linear case, i.e., for LFO formulas. Condition SCi is similar to the first condi- 
tion for exactness of extrapolation (see Section^. Condition SC 2 means that 
the effect of each iteration is deterministic, that is, each iteration of 9 has the 
effect of adding the same (parametric) value defined by Ag. 

Example 2. We apply PropositionQfor the extended automaton on FigureO (a), 
which symbolic graph contains only one self- loop 9 = Pq t\ Pq. For this circuit 

Ag = I and ^ ^2 0 ) ^ ~ ^6* Mq * Aj + Bg — Mq = Bg 

which satisfies trivially the conditions SCi and SC 2 . By Proposition^ we have 
6»"(a;o) = cco -I- n * (0, -2), since cf{Ag) = (0, -2). 
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ti: z > 1 / 01 = 0-2 
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(a) 







ro 

(b) 



Fig. 2. A Simple Parameterized Extended Automaton (a) and Its Symbolic 
Graph (b). 



Remark 1. The computation presented above can be extended to the case of Pa- 
rameterized Timed Automata for time-deterministic circuits. Since the evolution 
of clock variables is given not only by the operations on transitions, but also by 
the guards and the invariants, the computation of the effect of loops should take 
into account these elements. However, the computation of Ag is already done 
during the symbolic reachability algorithm for extrapolation. We have to test in 
addition that Ag satisfies condition SC 2 - 

6.2 Synthesis of Fairness Constraints 

We say that an elementary circuit 9 is boundedly iterable in the configuration Jq 
if there is an integer n > 0 such that 9^{ro) = 0. 

By Proposition n we know that for every parametric configuration Xq G Fq, 
9'^{xo) — Xq + n* cf{Ag). Let us denote by T[o the sequence of transitions 
To...Ti_i, for any i G {0, ..., k — 1}, and by r[o_i_i](a:) the effect of the sequence 
of transitions To...Ti_i on x {li i = 0, we consider that no transition is applied). 
Then, we have: 

Proposition 2. Let 9 be an elementary circuit such that Ag satisfies SCi and 
SC 2 - Then 9 is boundedly iterable iff ^Xq G Fq. 3n > 0. 3z S {0, ...,fc — 
!}■ T[o,*-i](®o + n* cf{Ag)) guard(Ti). 

The formula given by Proposition |2| have an equivalent form in terms of 
operations on PDBM: 

VPVA^VAo. MP,N) AMo{P,N,Xo) 

3n > 0. Fo.i-i] {Mo + n* Ag)n guard{n) = 0 

The validity of this formula is decidable in the linear case, since all variables are 
integers. If the formula above is valid, we synthesize a boundedness constraint 
T{9) G Bsg for the set T{9) = {tq, of transitions in 9. 

From the test given in Proposition El we can build a formula (j\’P{x,n) 
giving the condition under which after iterating n times 9 on the configuration 
X G Fq, the execution stops at the transition 



4\'P{xo,n) = T[o,i_i](a;o + n* cf{Ag)) ^ guard{Ti) 
which is easy to translate in terms of operations on PDBM. 
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Example 3. We consider here an example which is inspired from the model of 
the Bounded Retransmission Protocol. It is a system with two clocks ci and 
C 2 and a counter x (see Figure 0(a)). Intuitively, ci represents the clock of a 



ti'. ci=Ti,x < M I 
reset(ci),a:;=a;+l 




to-, /reset(ci), t 2 : C 2 =T 2 ,x <M 

reset(c 2 ), 

ci<Ti, 

C 2 < T2 
(a) 



ti 

tl t2 

Eo 

(b) 



A 



Fig. 3. An Elementary Circuit of BRP. 



sender and C 2 the clock of a receiver. These clocks are compared with parametric 
bounds A and T 2 supposed to be real values. The counter x counts the number 
of times the loop qotiqo is performed. The transition ti corresponds in BRP to 
a retransmission action by the sender. The number of these retransmissions is 
bounded by M which is an integer parameter. The question is whether the state 
qi (considered as a bad state) is reachable. Roughly speaking, this corresponds 
to a property of synchronization between the sender and the receiver in BRP: 
the timeout of the receiver should not expire before the sender has finished all 
its retransmissions. 

The symbolic graph generated for this example is given on Figure El (b), 
where Eq = (go. So) and A = (gi. Si) with 



50 = { Mo = (0 < Cl < Tl A (n + 1) * A < C 2 < (n + 2) * AA , 

Cl — C 2 = (n + 1) * A A a; = n + 1) 

4>o = {0 < n A 0 < Tl A {n + 2) * Tl < T 2 A n < M ) ) 

51 = { Ml = (0 < Cl < A A T 2 < C 2 A Cl — C 2 = (n + 1) * A A a: = n + 1), 

^1 = (0 < n A 0 < A A (n + 1) * A < T 2 < (n + 2) * A A n + 1 < M ) ) 

The presence of the unfair loop in A, 0 = TotiEo, fails the verification of the 
reachability property for gi. By analyzing 9, we find that Ag (on both clocks 
and counters) is a parameterized configuration and Ag(ci) = 0, Ag{c 2 ) = A> 
Ag{c 2 — Cl) = A, and Ag{x) = 1. By Proposition 0 we have that 9 is boundedly 
iterable iff the formula: 

VA, A, Cl, C 2 . VM, a;, n. (j)o A Mo => 3n' > 0. x + n' > M 

After the elimination of real variables (A, T 2 , ci, C2), we obtain the following 
linear formula on integer variables: 

VM, a;,n. 0<nAn<MAx = n+l 3n' > 0. x + n' > M 
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which is clearly valid. Hence, we can generate a boundedness condition for this 
loop, saying that if ti is taken infinitely often, then transitions different from ti, 
including t2, are also taken infinitely often. 



7 Synthesis of Fairness Constraints for Nested Loops 

Fairness constraints generation based on analysis of elementary circuits is suf- 
ficient for dealing with a wide class of systems (e.g., BRP). However, there are 
cases which require reasoning about arbitrary combination of elementary cir- 
cuits, i.e., connected components. 

Consider, for example, the program of Figure Eland its PEA model. 




x-.=Xo\ y- 


0 

II 

N 


while X > 


1 do 


z-.=y, 




while z 


> 1 do z\=z- 2 -, od; 


if z = 0 


then 


x:=x- 


1; i/:=J/+3; 


else 




y-y- 

fi- 


1; z:= 0 -, 


od; 





Fig. 4. Example of Nested Loops. 



The approaches IT^TTj based on ranking functions will fail to prove the termina- 
tion of this program. Indeed, the ranking functions synthesized for the innermost 
and the outermost while statements cannot be ordered according to a lexico- 
graphic order using the structure of the program: the behaviour of the inner 
depends on the value of y computed in the outermost loop, which behaviour 
depends of the value of z computed in the innermost loop. In our approach, the 
reachability algorithm applied on the extended automaton of Figured generates 
the symbolic graph given on Figure 0 and Tabled In this graph, we represent 
by dotted transitions a sequence of transitions. For example, configuration Tf 
is reached from Fq by executing the sequence to, ti. The symbolic graph has 
several self-loops build from transition t2- Using the method presented in Sec- 
tion E] we can infer that these loops are boundedly iterable, which means that 
the innermost while statement always stops. The connected components start- 
ing in Fq and Fq correspond to the outermost while statement for respectively 
odd and even values of the parameter Yq. Indeed, from the initial configura- 
tion Fq, we unfold one time the outermost while statement. After executing 
the sequence parameter no is introduced by iterating t2 in F2), we 

have two cases depending on which transition (to or t^) is chosen in configu- 
ration F2- If transition to is chosen, we deduce that Yq is odd (see constraint 
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(j>Q in Table P). By unfolding once more the outermost while statement (i.e., 
the sequence tit2t2t4), we generate a configuration which differs from Iq by the 
vector (xo = 0,x = —l,y = 2, z = 0). After two more unfoldings, the out- 
ermost while produces the same effect. The extrapolation technique generates 
the configuration Tg and the connected component tit2t2^3tit2t2t4 starting in 
Fq. The effect of each of the components starting in Tg and Fq is to add 




Fig. 5. Symbolic Graph for the Example of Nested Loops. 



Table 1. Main Symbolic Configurations for Graph on Figure El 
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^0 


A' 


91 
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the vector cf{A) = (0,— 1,2,0), which satisfies the conditions 5'Ci and SC2 of 
Proposition Dl This is due to the fact that the effect of innermost loops is reset 
by the transitions and However, these components are not elementary cir- 
cuits. Then, we need to generalize the techniques given in Section 0 in order to 
deal with a whole connected component. We show hereafter that, under some 
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conditions, reasoning about connected components can be reduced to reasoning 
about elementary circuits. 



7.1 Computation of the Effect 

Case 1. Consider the (simpler) case of a connected component C having a unique 
elementary circuit 9 containing the initial configuration of C, Fq = (go, (.^o, 4>o))- 
The connected subcomponents of C different from 9 may share zero or more 
symbolic configurations with 9 (see first part of Figure El). 

We build from C a connected component Cg having the same degree and 
execution paths that C, but where the connected subcomponents different from 
9 share with 9 at most one symbolic configuration. For this, let 9 = roToA'ril 2 
...Tk-iFo. We order the configurations of 9 according to the transition relation, 
i.e., Iq A A ^ A--- ^ Fk-i- Since 9 is the unique elementary circuit in C 
containing A, the unique path relating A to A with z < j is contained in 9. 
Let A be a configuration of 9 shared with a subcomponent of C. We note by 
CC{Fi) the maximal connected subcomponent of C not including 9, such that 
A is the minimal (in sense of a) configuration of CC{Fi) shared with 9. Then, 
Cg is build from 9 and, for each configuration A of 9, we put a copy of CC{Fi) 
such that the only configuration shared with 0 is A- 





6 = 9 + meta-transitions 



By construction, it follows that execution paths of C and Cg are equal. This 
means that if Cg is boundedly iterable, C is also boundedly iterable. The result of 
iterating one time C on a configuration x is defined by C{x) = {x' \ 3ni...nk-i > 
0. x' = (rfc_i o CC'(Ai)"*'“^ o o CC(A)”^ o T"o)(a^)}- Clearly, this set has 
cardinality > 1, since the iteration parameters introduce non-deterministic be- 
haviours. In order to be able to compute this set, we need to introduce additional 
conditions ensuring that applying C gives always the same result, i.e., its effect 
is deterministic. 

Suppose that it is possible to compute for each subcomponent CC(A) of 
Cg the effect Aqq(^p.'^ satisfying the conditions of Proposition E Then, we can 
replace the subcomponent CC{Fi) by a meta-transition which operation is 
x' := X + n * cf {Acc{ri)) with n > 0 (a new, unused n G N for each meta- 
transition). The meta-transition is introduced by splitting A in two: F' and 
F”, the source resp. the target of /r^. 
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In this way, we obtain an elementary circuit, called 9, which is built with 
transitions from 6 and with meta-transitions involving iteration parameters n G 
TV. The result of Proposition Q] can be applied to this circuit and we obtain that: 



Proposition 3. If Ag = Ag* Mg * AT + Bg — Mg satisfies conditions SCi and 
SC2, then Va:o G |TVfo, (/>o] Vn > 0. C’^{xq) = Xq + n* cf{Ag). 

We recall that condition SC2 asks that Ag is a configuration and it does not 
contain iteration parameters, so the set C{x) has cardinality 1. This is possible 
if the circuit 9 contains transitions with reset operations such that the effect of 
meta-transitions is canceled. 

Example 4- Consider the connected component C containing Tg in the symbolic 
graph given on Figure 0 (b). It corresponds to an odd initial value for Yq and 
has only one elementary circuit 9' containing Eq. The elementary loops in E^ 
and T 5 can be reduced to meta-transitions. Moreover, the effect computed using 
9' is a configuration with Ac{x) = —1, Ac(y) = 2, and Ac{z) = 0. By 
Propositional we have that Vxq G Efiin > 0. C""(a;o) = Xo + n* cf{Ac). The 
same computation may be done for connected component corresponding to even 
values of Yq. 



Remark 2. In some cases, we can deal with the presence of iteration parameters 
in Ag. The simpler case is when all operations in C cumulate values in counters, 
i.e., all assignments have the form x \= x + t (there are no reset operations). 
Indeed, in absence of reset operations, the (meta)-transitions in 9 cumulate their 
results, and the effect of the circuit is given by: C^{x) = x-|-Z'i<j<ooUi * cf{Ai), 
where G B are build from terms of assignments in (meta-)transitions. 

Case 2. Consider a connected component C having several elementary circuits 
including its initial configuration Ig = (qg, (TVfg, ((ig)). Let 0{C) be the set of 
these elementary circuits. In order to analyze C, we build from C a connected 
component C having the same degree and execution paths that C, but where each 
connected subcomponent including Jg contains only one elementary circuit of 
0{C). Moreover, each pair of such subcomponents shares only Ig (see Figure | 7 |). 
For this, let 9i be an elementary circuit in 0(C). With 9i, we consider all maximal 
connected subcomponents sharing states with 9i and without another elementary 
circuit in 0(C). This will form the maximal connected subcomponent of C built 
around 9i. We note by CC{9i) this subcomponent and by EC{C) the set of 
all these subcomponents, EC{C) = {CC{9i) \ 9i G 0(C)}. Then, C is build 
from EC{C) by splitting states such that the only state shared by each pair of 
subcomponents CC{9i) is the initial configuration Ig. 

It is easy to show that C and C have the same execution paths. This means 
that if C is boundedly iterable, C is also boundedly iterable. The paths build by 
iterating one time C on a; are given by C{x) = U6i.ge(c:)(Uj/iCC(0j))*CC(0i). 

Now, if we consider separately each subcomponent CC{9i) of C, we can use 
Proposition 0 to compute its effect, and to reduce it to a meta-transition. In 
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Connected component C 




Connected component C 



Fig. 7. Building C. 



the best case, where all the subcomponents CC{0i) may be reduced to a meta- 
transition, we can compute, for each 9i G 0(C) the effect of {Uj^iCC{9j))* 
CC(0i), noted Then, iterating C an arbitrary number of times n 

increments a: by a parameterized configuration chosen non-deterministically in 
the set {ni * Z\cc+(ei), ■ ■ ■ , n/ * Acc+(0i) | > 0, . . . , n; > 0}. To be able to 

characterize C”, we require that all these values be equal. 

Theorem 1. Let C he a connected component with degree(C) > 1, and Fq = 
(< 70 ) its initial configuration. If C satisfies the following two conditions: 

— Each strict subcomponent of C can be reduced to a meta- transitions. 

— For all elementary circuit 9j containing the initial state of C , the effect 
^CC(0j) satisfies conditions SC\ and SC 2 and are all equal to a closed pa- 
rameterized configuration Ac. 

Then, Va;o G |Mq, ^ 0 ] Vn > 0. C”(a;o) = XQ-\-n* cf{Ac). 

Proof. If it exists only one elementary circuit including Iq, the result follows 
from Proposition 01 If C contains more than one elementary circuits including 
To, we compute its effect using C build like above. The execution of C is given 
by Si{Uj^iCC{9j))*CC{9i), i.e., the sum of executions including at least an 
elementary circuit. By hypothesis, we can compute Acc(Sj)- Moreover, we have 
that all Acc+{ 0 j) satisfies SC\, SC 2 , and are equal to Ac. Then, using the 
definition of the execution of C, it follows that for any number of iterations n 
of C, iterations executed in any order, C"(To) — Fq n* Ac. Since C and C 
have the same execution paths and Ac represents a parameterized configuration 
(and not a zone), the result follows. 

7.2 Synthesis of Fairness Constraints 

We say that a connected component C is boundedly iterable in the configuration 
To if there is an integer n > 0 such that 0 ”(Tq) = 0. 

We show now how fairness constraints can be synthesized if C satisfies the 
hypothesis of Theorem 0 The idea is to synthesize a formula for C expressing 
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the fact that all execution paths of C stop. By Theorem ^ we know that for 
every parametric configuration a;o G Tq, C^(xo) = Xq + n* cf{Ac)- Moreover, 
we can built from C the component C Isee 17. i|) . which has the same execu- 
tion paths like C and disjoint connected subcomponents in EC{C). For each 
connected subcomponent of EC{C), let 9 = /oroA---A-i'rfc_i/o (in 0(C)) 
be its elementary circuit including the initial configuration of C. Following the 
method described in Section El we build from 9 and the meta-transitions 
(0 < i < k) corresponding to the connected component CC(Ej), the elemen- 
tary circuit 9 = r'o/ror'o'Tor(...F^_^/rfe_ir^'_^rfc_ir'o. We denote by ^'[oj-i] the 
sequence ^oTo...^j-iTj-i^j. Since all subcomponents of C may be reduced to 
meta-transitions, we can construct for each subcomponent CC(Ej) the formula 
oo( r ) r- 

(j)i ^ ^ ’ ^(x,n) which gives the condition under which after iterating n times 
CC(Ej) on the configuration x G Ej, the execution of CC(Ej) stops at its Fth 
configuration. 

We show that we can construct from these formulas another formula 
(x,n) saying that after iterating n times 9 on the configuration x G Eq, 
the execution of 9 stops at its j-th configuration (0 < j < k — 1). This formula 
is the disjunction of three formula expressing all the possible cases in which the 
execution may stop in configuration Ej. 

Case 1. None of edges outgoing from Ej and remaining in CC(9i) can be taken: 

4>^j]{i)(x,n) = 3too, Wj >0. f\ 0[oj_i](a: + n* cf(Ac)) ^ guard(T) 

T-.rj^.^r,r&cc(0) 

The integer variables mQ,...,mj are iteration parameters used in the meta- 
transitions of the sequence 0[oj_i]. We can quantify existentially all these vari- 
ables due to the determinism of the symbolic graph. 

Case 2. The execution stops into the connected subcomponent CC(Ej), but not 
in the first configuration of this component, i.e., Ej: 

= 3mo, >0. \f (9[oj_i](x + n * cf(Ac)),m) 

1^0 

Case 3. The execution of CC(Ej) does not terminate, which means that the 
hypothesis of infinitely often execution of all transitions in C is not satisfied: 

(j)^.)^°^{x,n) = 3rno, ...,rnj > 0. Vm > 0. (9[oj_i]{x + n * cf(Ac)),m) 



Proposition 4. Let CC(9) be a eonneeted eomponent having a unique elemen- 
tary eircuit 9 sueh that we ean eompute its effect Acc(^gy Then CC(9) is bound- 

edly iterable VPVNVa;o. Eq(xq,P,N) 3n > 0. Vjg{o,...,fe-i} ?^)- 

It is easy to show that the test given above can be translated into a test using 
operations on PDBM. The formula obtained is decidable in the linear case. 
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The criterion to test bounded iterability of C is obtained by the conjunction 
of all the formula obtained for each component in EC{C). If this criterion is 
valid, we can synthesize the boundedness constraint T{C) G Bsg for the set 
T(C) of transitions in C. 

Example of Nested Loops (cont.) Let consider the symbolic graph given on 
Figure El Let C' be the connected component starting in /jj, which corresponds 
to an odd initial value for Yq. Let 9' be the elementary circuit of C" containing 
Fq. The effect of C may be computed since the elementary loops in and Fg 
can be reduced to meta-transitions. Moreover, the effect of C is constant and 
equal to cf{Ac) = (0,— 1,2,0) (see the beginning of this section). Then, the 
Proposition E| can be applied. The formulas built for cases 2 and 3 above are 
trivially false, since all the connected sub-components of C" stop. For case 1, 
the formulas built for the configurations other than Fq and Fg (i.e., for j = 0, 3) 
are also trivially false, since it is always possible to take a transition in C' from 
these configurations. For both configurations Fq and Fg, we obtain the formula 
following involving the guard of the transition ti: 

(t>o'(i){x,y,z,n) = (xo,x,y,z) + n* cf{Ac) ^ guard{ti) = x-n<0 

By Proposition 0 we have that C" is boundedly iterable iff the formula VXo,!)) 
'in[\/x,y, z. Fq 3n > 0. x — n < 0 is valid. Since in Fq we have that 
X = Xq — n I — 1, this formula is valid because for all values of Xq and n( it 
always exists an integer n > 0 such that Xo — n'i — l — n < 0. So, the boundedness 
condition T{C) G Bsg is generated. The same analysis can be done for the 
component starting in Fq'. 

8 Conclusion 

We have presented a method for verifying safety and liveness properties for fair 
parameterized extended automata (PEA). Our method is based on symbolic 
reachability analysis using powerful symbolic representation structures and ex- 
trapolation techniques. We have mainly developed techniques allowing to de- 
termine whether a component (i.e., a loop and some kind of nested loops) in 
the symbolic graph is boundedly iterable, i.e., corresponds to the abstraction of 
a terminating component of the original model. Our techniques can deal with 
complex systems that cannot be analyzed by the existing automatic techniques, 
e.g., those based on synthesis of ranking functions fmri . In particular, we have 
shown how our techniques can be used to analyze automatically a parameterized 
version of BRP protocol. 
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Abstract. In this paper we study the relation between the lack of com- 
pleteness in abstract interpretation of model-checking and the structure 
of the counterexamples produced by a model-checker. We consider two 
dual forms of completeness of an abstract interpretation: Forward and 
backward completeness. They correspond respectively to the standard 
7 /a completeness of an abstract interpretation and can be related with 
each other by adjunction. We give a constructive characterization of 
Clarke et al.’s spurious counterexamples in terms of both forward and 
backward completeness of the underlying abstract interpretation. This 
result allows us to understand the structure of the counterexamples that 
can be removed by systematically refining abstract domains to achieve 
completeness with respect to a given operation. We apply our result to 
improve static program analysis by refining the model-checking of an 
abstract interpretation. 

Keywords: Completeness, Model-checking, Verification, Abstract Inter- 
pretation, Domain Refinement, Program Analysis. 



1 Introduction 

Many authors recognized in the possibility of modifying abstract models by mod- 
ifying abstractions a great potential for improving abstract model-checking in 
precision and reducing complexity (e.g. see Section 9 in mil), but few applica- 
tions of these techniques are known in the field of model-checking. In this paper 
we observe that there exists a strong connection between the standard notion 
of complete abstract interpretation and the corresponding one for ab- 

stract model-checking fTT!E| . and we show how the latter one can be achieved 
by minimally modifying abstract domains. 

Completeness in abstract interpretation corresponds to require that no loss of 
precision is introduced by approximating a semantic function computed on ab- 
stract objects with respect to approximating the same computation on concrete 
objects. Therefore, no loss of precision is accumulated in abstract computations 
by approximating concrete input objects. This property comes directly from the 
original definition of Galois-connection based abstract interpretations |blYj : If 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 356-EZ3 2001. 
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a and 7 is a pair of adjoint functions in a Galois connection specifying an ab- 
stract interpretation, and 0 and 0** are respectively the concrete and abstract 
operations, we say that the abstract interpretation is complete if aoO = 0**oa. 
Recently, Giacobazzi et al. HU observed that completeness for an abstract in- 
terpretation, i.e., abstract domains and abstract operations, only depends upon 
the structure of the underlying abstract domains and, in particular, that it is 
always possible to make any abstract domain A complete with respect to a con- 
crete function 0 by minimally extending or even reducing A. This process is 
known as the complete shell/core of A. This means that a systematic domain 
transformer can be introduced to improve precision of domains with respect to 
a given function by letting the new domains be complete with respect to this 
function. 



The Problem 

The idea of abstract (state-)model-checking is that of verifying temporal prop- 
erties against an approximated model which is systematically derived from the 
concrete semantics of the system we want to analyze. This is always achieved 
by approximating the information contained in its states. Since the pioneer- 
ing work on model-checking and abstraction by Glarke et al. |2|, a number of 
works have applied this idea to reduce the phenomenon of “state explosion” (e.g. 
pnnni i . Indeed, verifying a temporal logic formula against a model is in general 
a hard problem (PSPAGE complete for GTL*) [III 1 1 1 tij . The problem of refin- 
ing abstract model checking is precisely that of improving precision in temporal 
logic verification by refining a state partition (called in |U an abstraction) that 
turns out to provide a too rough abstract model for verifying a given temporal 
property of interest. Roughly speaking, by refining the abstraction in abstract 
model-checking we gain precision, namely the refined model becomes more se- 
lective (being closer to the concrete model) than the abstract one. Moreover, the 
size of the new refined abstract model has to be kept as small as possible in order 
to avoid to re-introduce the “state explosion” phenomenon. Formulated in this 
way, this is indeed precisely a domain refinement problem: Improve abstractions 
(viz. enhance domains) in such a way that the new abstraction contains the least 
amount of information that is needed to achieve a given precision degree |1 ‘.^|1 .‘Ij ■ 
It is not an easy task to apply the standard abstract domain refinement theory 
(iTTTO) to abstract model-checking refinement. This because of two reasons: 
(1) Even though any state partition can be associated with a Galois connection, 
the reverse does not hold in general. Therefore it is not always immediate to as- 
sociate a refined state partition with a refined abstract domain (or equivalently 
Galois connection) as the latter may not correspond to a state partition. (2) We 
need to express the meaning of improving precision in abstract model-checking 
by refining domains. This has to be related with the structure of the formulas 
that are verified in the new refined model. We solve both these problems in 
the context of abstract domain refinements to achieve complete abstract model 
checking. 
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Main Results 

In abstract model-checking soundness means that when a property is true in the 
abstract model it will also be true in the concrete one. However, the approxi- 
mation may not be complete in the sense of model-checking, i.e., if the property 
of interest is false in the abstract model, the counterexample produced by the 
model-checker may be the result of some particular traces in the approximated 
model, which are not present in the concrete one, i.e., which do not correspond 
to any concrete computation. In 0 this phenomenon has been isolated, and 
these counterexamples have been called spurious. We consider spurious coun- 
terexamples as a measure for the achieved precision in systematically refining 
abstract model-checking by standard domain transformers. We first introduce 
the notions of backward and forward completeness for an abstract interpreta- 
tion. While backward completeness corresponds precisely to the standard notion 
of completeness for an abstract interpretation izm, forward completeness is less 
well-known and it corresponds to what is sometimes called “exactness” of an ab- 
stract interpretation (e.g. ^). We show that forward and backward completeness 
are dual notions, and that, as well as for backward completeness, also forward 
completeness can be achieved by minimally refining abstract domains. Then we 
observe that any abstract domain refinement induces a refined state partition 
which in turn reduces the amount of spurious counterexamples in the refined 
abstract model. Finally we show that the counterexample-guided abstraction 
refinement by Clarke et al. 0 corresponds precisely to iteratively compute the 
backward (forward)-complete shell of the corresponding abstraction with respect 
to the forward (backward) state transition function. This provides a purely alge- 
braic characterization of spurious counterexamples in terms of standard abstract 
interpretation theory. Moreover, this result allows us to characterize the struc- 
ture of spurious counterexamples that can be removed by refining a domain to 
achieve completeness with respect to an arbitrary function, therefore including 
most domain refinements known in the literature. Our results are applied in the 
systematic refinement of data-flow analysis as the model-checking of abstract 
interpretations, as formulated in j2flj . 



Related Works 

The body of research on the connection among model-checking, abstract inter- 
pretation and data-flow analysis is huge, even though this field still represents 
a major challenge in formal methods. In particular the problem of completeness 
in abstract model-checking has been studied by a number of authors both the- 
oretically and from the point of view of improving model-checking algorithms. 
In P) the authors proposed an automatic refinement technique which uses infor- 
mation obtained from erroneous counterexamples produced by a model-checker. 
They introduced the notion of spurious counterexample and gave an algorithm 
based on the inverse image of the forward transition function to refine the ab- 
straction. However, the authors did not consider the classical notions of abstract 
interpretation in order to improve the precision of the approximated models. 
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In our work we prove that this algorithm can be formulated as an instance of 
the more general problem of making an abstraction complete in the standard 
sense of abstract interpretation. In 0 Cousot and Cousot introduced a novel 
general temporal language, inspired by Kozen’s /i-calculus, which includes most 
standard specification languages such as CTL and CTL* as special cases. The 
authors proved that the classical state-based model-checking of this language is 
an abstract interpretation of its trace-based semantics, which turns out to be in- 
complete. More recently Ranzato m found the complete shell according to m 
of Cousot’s state-based model-checking abstraction. This construction has been 
applied to the whole specification language introduced in jO] , proving that there 
exists no complete abstraction which includes the state-model-checking and it 
is strictly more abstract than the trace-based model-checking. This proves that 
it is impossible for state-model-checking to achieve completeness with respect 
to trace-model-checking, without including the traces them self. In p| the au- 
thors introduced a new concept of completeness, called partial completeness , for 
problems which are intended to check abstract fix-points for specifications. A 
checking algorithm is partially complete when in case of termination the answer 
is exact. This notion can be directly applied in model-checking problems, where 
one is interested in partially complete abstractions, i.e., approximations of a 
given model which always yield an affirmative answer when the specification is 
correct (i.e. it holds in the original model) and the algorithm does terminate. 
This notion of completeness seems different from the one considered in this pa- 
per, but further research is needed to better understand the relation between 
these two notions of completeness. 



Structure of the Paper 

The paper is organized as follows: In Sections 2 and 3 we recall first the main 
notions concerning temporal logic and model checking, and then the standard 
theory of Galois-connection based abstract interpretation. The main results of 
the paper are in Section 4, Section 5, and Section 6. In Section 4 we introduce 
forward and backward completeness of an abstract interpretation and we study 
the relation between these two notions and a constructive method to minimally 
modify domains in order to achieve either forward or backward completeness. 
In Section 5 we introduce abstract model checking refinement by domain trans- 
formers and in Section 6 we characterize the precision of a refined abstract model 
in terms of Clarke’s spurious counterexamples. Finally, in Section 7 we give a 
model-checking perspective of the problem of complete data-flow analysis. Sec- 
tion 8 concludes with future works. 



2 Temporal Logic and Model-Checking 

In the following we consider a fragment VCTL* of the branching time temporal 
logic CTL* jam: The formulas of VCTL* do not contain existential quantifiers. 
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Of course, all the results apply to the universal fragment of the weaker lan- 
guage CTL, as well. In VCTL* universal properties are expressed through the 
path quantifier V (“for all futures”) that quantifies over (infinite) execution se- 
quences. The temporal operators G (Generally, always), F (Finally, sometime) 
X (neXt time), and U (Until) express properties of a single execution sequence. 
These operators, as well as other syntactic possibilities, can be freely nested in 
a formula. Given a set Prop of propositions, the set Lit of literals is defined as 
Lit = PropU{^q \ q G Prop} U {true, false}. State formulas 4> and Path formulas 
Ip are inductively defined by the following grammar, where p G Lit: 

State formulas: (p ::= p \ (p A 4> \ (pV (p 

Path formulas: ip ::= (p\ipAip\ip\/ip \ Gip \ Fip \ Xip \ U{ip, ip) 

Observe that negation is allowed only for literals. Therefore, the results that 
allow us to write the existential modal operator using universal formulas do not 
hold here. Observe also that F is superfluous because Fip = U {true, ip) . 

A transition system is a pair {E, R) consisting of a set E of states and a transition 
relation R C E x E that is assumed to be total. A Kripke structure is a tuple 
M = {E,R,I,\\ ■ II) where {E,R) is a transition system, I C E is the set 
of initial states, and || • ||: Lit — > p(X) is the interpretation function such 
that II p 11= { s € A I s ^ p }. For VCTL* the notion of satisfaction of a state 
formula (phy a, state s (s |= (^) is as usual (10]. If M = {E, R, I, || o ||) is a Kripke 
structure, we say that M ^ p if and only if Vs G / : s |= :p. Given a temporal 
formula :p the satisfiability problem for <p is that of checking if there exists a 
Kripke structure M such that M ^ p. In the case of CTL* (hence of VCTL* ) 
this problem is decidable El . For verification purposes the following problem is 
known as the (global) model- checking problem (MCP): given M = {E,R,I, || o ||) 
and a formula <p, check if M |= <p. 



3 Abstract Interpretation 

In the following (C, <, V, A, T, T) denotes a complete lattice C, with ordering 
<, lub V, gib A, greatest element (top) T, and least element (bottom) T. The 
downward closure of S' C C is defined as } S (a: G G | 3p G S. a; < p}. } a; is 
a shorthand for } {a;}, while the upward closure } is dually defined. The nota- 
tion C = D denotes that C and D are isomorphic possibly ordered structures. 
Recall that a function f G C ^ D is (Scott-) continuous if / preserves lub’s of 
(nonempty) chains iff / preserves luFs of directed subsets. In the following we 
consider Galois connection based abstract interpretation |0j . A pair of functions 
/ : A — > R and g : B ^ A forms an adjunction if 

Va: G AXy G B. f{x) <b y x <a g{y)- 

f (resp. g) is the left- {right-) adjoint to g (/) and it is an additive (co-additive) 
function, i.e., / preserves lub’s {gib’s) of all subsets of A (empty-set included). 
Additive and co-additive functions / admit respectively right and left adjoint: 
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/+ = Ax. V { 2 / 1 f(y) < 2; } and / = Xx. f\{ y\x < f{y) } respectively. 

Remember that (/’’’)“ = (/”)’’’ = /• A Galois connection (GC for short) is an 
adjunction between posets, and it is denoted {A, /, B, /+). In GC-based abstract 
interpretation the concrete and abstract domains, C and A, are often assumed to 
be complete lattices and are related by abstraction a : C ^ A and concretization 
7 : A — > (7 maps forming a GG (C, a, ^,7). If in addition Va S A. a(j(a)) = a, 
then we call (C, a, A, 7) a Galois insertion (GI) of A in C. When (C, a, A, 7) is a 
GI then each value of the abstract domain A is useful in representing C, because 
all the elements of A represent distinct members of C, being 7 1 - 1 . Any GG 
may be lifted to a GI by identifying in an equivalence class those values of the 
abstract domain with the same concretization. This process is known as reduction 
of the abstract domain. An upper closure operator on a poset C is an operator 
p : C ^ C which is monotone, idempotent, and extensive (Vx G C. x < p(x)). 
Lower closures are dually defined. The set of all upper (lower) closure operators 
on C is denoted by uco(C) (lco(C)). Each closure is uniquely determined by the 
set of its fix-points p(C). A fundamental property of closure operators is that 
if C is a complete lattice then both (uco(C), G, U, □, Ax.T, Ax.x) ordered point- 
wise, and (p(C),<,Vp, A,T, p(±)) with VpX = p(VX), are complete lattices. 
It is well known since Pj that abstract domains can be equivalently specified 
either as Galois insertions or as (sets of fix-points of) upper closures on the 
concrete domain. In particular A C C is the set of fix-points of an upper closure 
p on C iff A is a Moore-family of C, i.e., A = M(A) = {AS” | S C A} — 
where A0 = T € M(A), iff A is isomorphic to an abstract domain A in a GI 
(G, a,A, 7), i.e. A = p{C) with t : p{C) A and : A p(C) being the 
isomorphism, and (G, rop, A, (.“^) is the GI, i.e. p = 70a. Therefore uco{C) is 
isomorphic to the so called lattice of abstract interpretations 0/ G [Zj . For any 
A C G, M(A) is called the Moore-closure of A in G, i.e., M(A) is the least 
(w.r.t. set-inclusion) subset of G which contains A and it is a Moore-family of 
G. In this case A G R iff R C A as Moore families of G, iff A is more concrete 
than B. In the following we will find particularly convenient to identify closure 
operators (and therefore abstract domains) with their sets of fix-points, possibly 
by using as notation capital Latin letters. 

4 Backward and Forward Completeness 

In abstract interpretation there are two equivalent ways to express the soundness 
of an abstraction jS]. Let G be a complete lattice, / : G — > G, (G, a,A,7) be 
a Galois insertion, and /•* : A — *■ A. Then (G, a. A, 7) and /•* provide a sound 
abstraction of / if aof < f^oa, or equivalently (by adjunction) if /07 < jop. 
While these two definitions of soundness are equivalent, they are not equivalent 
when equality is required. In the first case aof = f^oa means that no loss of pre- 
cision is accumulated by approximating the input arguments of a given semantic 
function; while /07 = 70/tt means that no loss of precision is accumulated by 
approximating the result of computations on abstract objects. These two notions 
represent therefore two different forms of completeness: The first is related with 
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input arguments of a computation, while the second is related with its output. 
We distinguish here between these two forms of completeness: The first is called 
backward (“B) and the second is called forward (T) completeness. The reason for 
these names will be clear later in the paper. 

Definition 1. Let C be a complete lattice, f : C —> C, (C, a, A, 7) be a Galois 
insertion, and /** : A — > A. 

— (C, a, A, 7) and are “B-eomplete for f if aof = f'^oa. 

— (C, a. A, 7) and f^ are 7 -eomplete for f if /07 = 70/#. 

®-completeness (see ^3]) corresponds to ask that the approximate function f^ 
perfectly mimics the concrete function /, when the latter is approximated in A, 
viz. both are compared in the abstract domain A. Conversely, T-completeness 
corresponds to ask that /•* perfectly mimics the function / applied to the same 
abstract arguments, viz. they are both compared in the concrete domain C. 
The following proposition extends a result in m about iB-completeness to T- 
completeness. In this case there exists an either 23 or T complete abstract func- 
tion f^ in an abstract domain A iff the best correct approximation 0:0/07 of 
/ is respectively either 23 or T complete. This means that, as well as for 23 - 
completeness cn, also T-completeness is a property of abstract domains, namely 
a property of the GI (C, o, A, 7). 

Proposition 1. Let C be a complete lattice, f \ C ^ C, and (C, o,A, 7) be 
a Galois insertion. There exists : A — > A such that (C, o, A, 7) and f^ are 
'B-complete (‘J-complete) for f iff aof = ao/0700 (foj = 7000/07^. 

Therefore, we can define when a domain (closure operator) is 23 /T-complete for 
a semantic function as follows. 

Definition 2. Let C be a eomplete lattice and p G uco{C) be an abstract domain, 
p is T>('J)- complete for f if pof = pofop (fop = pofop). 

While 23 -completeness is well known in abstract interpretation and it corresponds 
to the standard notion of completeness nuTi . the notion of forward complete- 
ness is less known. 23 -completeness means that the domain p is expressive enough 
such that no loss of precision is accumulated by abstracting in p the arguments 
of a function /. Conversely, T-completeness means that no loss of precision is 
accumulated by approximating the result of the function / computed in p. This 
justifies the choice of the backward and forward names above. In the following 
we denote by T(C, f) = {p\ fop = pofop } and ®(C, f) = {p\ pof = pofop } 
respectively the set of T and 23 complete abstractions of C for /. It is worth 
noting that in general T(( 7 , /) % 23 (C, /) and T(C, /) 2 ®(C')/)> namely they 
are incomparable notions. 

Example 1 . Let Sign~^ be the domain in Fig. 1 which is an abstraction of 
(p(Z),C) for the analysis of integer variables and sq : p(Z) ^ p(^) be the 
square operation defined as follows: sq{X) = { | a; G AT } for X G p(Z). Let 



Incompleteness, Counterexamples, and Refinements 363 



p G uco(p(Z)) be the closure operator associated with Sign~^, i.e. p = 70a, 
where the abstraction and concretization maps are the most obvious ones. The 
best correct approximation of sq in Sign^ is sq^ : Sign^ Sign^ such that 
sq^{X) = p{sq{X)), with X G Sign'^ . It is easy to see that the upper-closure 
operators pa = {Z, [0, -l-oo], [0, 10]} and pb = {Z, [0, 2], [0]}, respect the following 
facts: Pa = {Z, [0, -l-oo], [0, 10]} G J{Sig'nX , sq^) but pa = {Z, [0, -l-oo], [0, 10]} ^ 
%{Sign+,sq^) (for example, pa(sg‘*(pa([0]))) = [0,-fioo] but pa(sg‘*([0])) = [0,10]) 
and Pb = {Z, [0, 2], [0]} G 'B{Sig'nX , sq'^) but pb = {Z, [0, 2], [0]} ^ T(S'z(jfn+, sq'^) 
(e.g. pb{sq'^{pb{[0,‘2]))) = Z but sq'^{pb{[0,2])) = [0,10]). The arrows in Fig. 1 
show the non fix-points results of sqK 
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Fig. 1. The Abstract Domain SigvX and Two Abstractions. 



Clearly, when an abstract domain p is both 3 and T complete for /, then p is a 
morphism, i.e., pof — fop, namely T(C, /) C 3{C, f) = { p \ p°f = f°p }. Note 
that in the Example 1, {Z, [0, -l-oo], [— 00 , 0], [0]} G 3^{Sign^ , sq'^) C 3(Sign'^, sq^). 

The following result is the main result in H3 and it is the basis for a con- 
structive characterization of the complete shell of an abstract domain, viz. the 
least (most abstract) domain which is 'B-complete and includes a given domain. 
This result constructively characterizes the structure of 'B-complete abstract do- 
mains for continuous functions. Recall that, if / : C — > C is a unary function, 
then = { a; | f{x) =y}. 

Theorem 1 ( [T3| ) . Let f \ C ^ C be continuous and p G uco(C). Then p is 
3 -complete for f UyGp(C) v)) ^ P(C)- 

It is easy to see in Example 1 that pa is not B-complete because it does not 
include the maximal inverse image of sq^ (see Fig. 1), namely the point [0, 2]. 

An analogous, but simpler to prove, result can be stated for T-completeness. 
In this case, we can characterize T-complete domains for merely monotone op- 
erators. 

Theorem 2. Let f : C ^ C he monotone and p G uco(C). p is 7 -complete for 
f iffWx € p(C). fix) G p{C). 
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Therefore, while 'B-complete domains are closed under (maximal) inverse image 
of the function /, T-complete domains are closed under direct image of /. It 
is easy to see in Example 1 that ph is not T-complete because it does not in- 
clude the direct image of sq^ (see Fig. 1), namely the points [0, 10] and [0,-|-oo]. 
Both Theorem 1 and 2 together specify a strong relation between 23 and T com- 
pleteness, which can be specified in terms of adjunction when / admits a right 
adjoint /+. 

Corollary 1. Let C be a eomplete lattiee and f : C ^ C he an additive funetion. 
ThennpiS),f) = 9ip{S)J+). 

Moreover it is always possible, in view of Theorem 1 and 2, to associate with each 
continuous semantic function / : C — > C a corresponding domain refinement 
that transforms any abstract domain A into the closest (most abstract) “B/T- 
complete domain for / which includes (is more concrete than) A. In the first case 
we obtain the (23-)complete shell in ca. while in the second we obtain what we 
call the T-complete shell. 

Definition 3. Let C be a complete lattice and f : C ^ C be a continuous 
function. We define : uco{C) uco{C) and : uco{C) uco{C) such 
that: 

- tRj = \X e uco{C). max{f~^{l y))); 

- IRJ = XX e uco{C). M{f{X)). 

It is immediate to prove that both IR® and IRj are monotone operators on 
uco{C). The following result, which follows by Theorem 1 and 2, characterizes 
in a unique domain-equational form the 23/T-complete shell of abstract domains 
for a continuous function f : C —> C. 

Theorem 3. Let f : C ^ C be a continuous function and A € uco(C) be an 
abstract domain. Let i G {23, T}.' 

X is £-complete for f and X n A iff X = An IR^(X); 

Therefore, the greatest (viz, most abstract) domain which includes A and which 
is ^-complete for / is §j(A) = gfp{XX. A □ 3lj(X)). This domain is called 
the i-complete shell of A with respect to /. Note that, as proved for the 23- 
complete shell in na, IR^ above is a co-additive operator on the lattice of abstract 
interpretations uco(C'), and that XX. S^(X) is a monotone, idempotent, and 
reductive operator on uco{C), i.e. 8^ € lco(uco(C)). 

5 Refining State Partitions vs Refining Domains 

In this section we study how to systematically refine abstract model-checking, 
namely we consider functions of type p(T’) ^ p(^) with X being the set of states 
in a transition system (X, R) . The following definition introduces the notion of 
abstract Kripke structure according to Dams et al. m- 
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Definition 4. Let M = (17, /, R, || o ||) be a Kripke Structure and (p{S), a, A, 7 ) 
be a GI. The corresponding abstract Kripke Structure || o ||1*) is 

defined as follows: 

- a G iff 3d G I. a{{d}) = a; 

- R^^ { {X,Y)\3,^x3y^Y- R{x,y) }; 

- i?*(oi,a 2 ) iff a 2 G { a{Y) \ Y G min { Y' \ R^^{'f{ai),Y') } }; 

- II P \\^~ { a G ^|7(a) C|| p || }. 

Note that for any a G A, the minimal sets Y' such that i?^^( 7 (a), F') are all 
singletons. 

In the following we consider abstract domains as specified by a GI, or equiv- 
alently by a closure operator on p(L7), where S is the set of states of a tran- 
sition system {E,R). This is apparently in contrast with the original (stan- 
dard) definition of abstract model-checking by Clarke et al. P|. In this latter 
case the abstraction is specified by a surjection function, called state parti- 
tion, h : E S on a, given set S of abstract states. Notice that any surjec- 
tion h : E S induces an equivalence relation = on the set of states E in 
the following way: Let a,b G E, then a = 5 iff h(a) = h(b). Given a sur- 
jection h : E ^ S we can immediately derive a GI in the following way: 
{p{E),ah,p{S),jh), where ah{X) = { h{x) \ x G X is the abstraction func- 
tion and 'yh(Y) ® | G T } is the concretization function. Note that the 
GI is defined by considering p{S) instead of S, because it is not assumed that 
the set of abstract states S is partial ordered. 

The main problem with this approach and abstract domain refinement is 
that, in general, if / : p{E) —>■ p{E) is some predicate transformer on the tran- 
sition system {E,R), then the refined domain §j{’jh°o:h) G uco(p(E)), with 
£ G {T, ®}, may not be of the form p(S) for some set of abstract states S. In 
particular Sj(jh°cnh) niay not be a Boolean algebra, and therefore the power- 
set of some set. Because we abstract state-model-checking, it is sufficient to 
observe how the refined domain abstracts each single state. Indeed, any ab- 
stract domain refinement induces a domain surjection in the following way: Let 
S = {Sj(jh° 0 !h)({x}) I X G E}, then the function h = XxG E. §^j:(pfh°o:h)i{x}) 
is a surjection from E into S such that: [S’! < [S'! and (p(T'), p(S'), 7 ^) is a 

GI. This corresponds to improve the partition induced by h into a finer partition 
of the concrete states as induced by h. This improvement is obtained by consider- 
ing a new set of abstract states each one consisting of the (unique) best possible 
abstraction of each concrete state according to the refined closure §f{’jh°C(h)- 

Proposition 2 . Let h : E ^ S be a state partition and (p(E), ah, p(S),jh) be 
the corresponding GI. If§ : uco(p(E)) — > uco{p{E)) is a domain refinement, i.e. 
a monotone function such that \/X G uco{p{E)). §(X) C X, then the function 
h§ : E ^ {S( 7 ft,oa/j)({a;}) | x G E} such that h$ = Xx. §( 7 / 100 / 1 ) ({a;}) is a 
refinement of the state partition h. 

Therefore, the standard theory of abstract domain refinement (see H3!) can be 
fully applied directly to refine any abstract (state-)model-checking problem. In 
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this latter case, a refined state partition function can be systematically derived 
from the refined abstract domain as described in Proposition 2. 

Example 2. Consider the transition system ([0,9],i?) where the transition rela- 
tion R is shown on the left in Fig. 2 with continuous lines. Consider the state 
partition function h(x) = x mod 4. Following Definition 4, this induces an ab- 
stract Kripke structure associated with the GI {p{[0,9]),ah,p{{a,b,c,d}),^h) 
as depicted in Fig. 2 with empty boxes and dashed lines. Let / : p([0,9]) ^ 
p([0,9]) be such that f{X) = {3}. The forward complete shell of ')h°oih is 
gfp{XX. jh°ah n OlJ(X)) = JlJ(jh°ah),^Sind it is immediate to observe that 
it induces a new state partition function h such that 



( X if a; = 3, 7 
h{x) otherwise 



The GI (p([0, 9]), an, p({a, b, c, d}), 7 /i) is not J-complete for /, e.g. if ph = ^h°oih 
then /(pft({2,6})) = {3} but ph{f{ph{{2,d}))) = {3,7}. In this case, in order 
to obtain the complete shell of ph we have to add the abstract state {3} to the 
abstract domain. This leads to a partition of the abstract state {3,7} into two 
more concrete states: {3} which is introduced by adding the forward image of 
/, and {7} which is a new state (actually, it is a renaming of the state {3,7}) 
corresponding to the (unique) best possible approximation in OlJ (jh°o:h) of the 
concrete state 7. This induces a refined abstract Kripke structure associated with 
the refined GI (p([0, 9]), a^, p({a, 6 , c, d, e}), 7 ^), as shown on the right in Fig. 2. 
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Fig. 2. An Abstract Kripke Structure and Its Refinement. 



6 Completeness and Counterexamples 

We are in the position to apply forward or backward completeness abstract do- 
main refinement to abstract (state-)model-checking problems. In this section we 
relate the abstract domain refinement for both backward and forward complete- 
ness with the presence of suitable counterexamples in abstract model-checking. 
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A path in a Kripke Structure M = {S,I,R, || o ||) is an infinite sequence in 
N — > A, denoted tt = so> si) • • •> such that sq G I and for every z S N, R{si, Si+i) 
holds. Traditionally, a terminating execution has a final state which is repeated 
forever. With tt^ we denote z-th state in the path, i.e., Si. Given a Galois insertion 
(p(A), a, A, 7 ) and concrete and abstract paths tt and respectively, we denote 
by a( 7 r) the sequence a( 7 To), a( 7 Ti), . . . and by 7 ( 7 r**) the set { tt | a( 7 r) = tt** }. 
Given a path tt and a state a; € A, a; € tt iff 3z G N such that x = Wi. The for- 
ward predicate transformer associated with R is post[R] : p{S) p{^), which is 
defined as post[R]{X) = { s G A | R{x, s) A x G X }. Dually, we can define the 
backward predicate transformer associated with R as pre[R] : p{S) p(A), 
which is defined as pre[R]{X) = { s G S\'ix G S. R{s,x) => a; G A }. The fol- 
lowing is a well known fact about forward and backward predicate transformers 
(e.g. |Ej). 

Proposition 3. {p{E),post[R],p{S),pr^[R]) is a Galois connection. 

Let us consider a domain refinement which is specified as the T/'B-complete shell 
with respect to a continuous function / : p{E) p{E) as given in Theorem 1. 
Let Rf C A X A be the relation associated with /, i.e. such that Rf{x,y) 4A y G 
f{{x}). It is immediate to observe that / = post[Rf]. 

Proposition 4. Let f : p(E) p(A) be additive. Then we have 

®(p(r),/) = 'E{p{Slpost[Rf\) = T(p(A),/+) = T(p(r),fl?h[i?y]). 

Traditionally, a model-checker is used to determine whether a temporal formula 
holds in a model. If the answer is negative, i.e., the model does not possess the 
property specified by the formula, then the model checker generates a coun- 
terexample. Let § : uco(C) uco{C) be an abstract domain refinement, i.e., a 
monotone operator such that §(A) C X for any abstract domain X G uco{C) 
H3- It is immediate to observe that 8 reduces the number of counterexamples 
that might be generated by the model-checker. Recall that, given a Kripke struc- 
ture M = {E, /, i?, II o II) and a formula p G VGTL*, M ^ p iff 3s G /. s ^ p, iff 
there exists a path starting at / which does not satisfy p. This because VGTL* 
can only express formulas that describe behaviors that should hold on all paths 
from the initial states. Hence, if A is an abstract domain and and are 

respectively the abstract and refined Kripke structures associated with A and 
8(A), then we have that: 

{ p I ^ p } D { p I ^ P } • 

It is therefore natural to consider counterexamples as a measure of the achieved 
precision in abstract model-checking when the abstract domain is refined. How- 
ever, not all counterexamples are admissible, because an abstract counterexam- 
ple might not be valid in the concrete model. Glarke’s et al. |2| introduced the 
notion of spurious counterexample in order to model this situation^. 

^ Note that we consider path counterexamples only. Clarke’s et al. Q consider instead 
loop counterexamples, which reduce to paths by means of loop unfolding. 
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Definition 5. Let (p(i7), a, S'**, 7 ) be a GI. Consider the concrete and abstract 
Kripke structures {S, I, i?, || o ||) and (S**, /**, i?**, || o ||#) and (p G \/CTL*r] LTL. 
An abstract path tt** is a spurious counterexample if ttq ^ p and 7(71**) = 0. 

The following theorem relates spurious counterexamples with backward and for- 
ward completeness with respect to post[R] and pre[R] respectively. The idea is 
that if a counterexample is spurious for an abstraction p, then there exists a 
concrete state x for which p fails to be forward complete for post[R] in x. The 
proof relies upon the fact that for spurious counterexamples tt**, 7(71**) = 0 and 
therefore there exists a concrete state x from which there is no forward transi- 
tion into an abstract state, while there is a forward transition from p({a;}). This 
clearly violates completeness for post[R\. 

Theorem 4. Let (p(i7), a. S'**, 7) be a GL and p = 70a. Let p G \/CTL* and tt** 
be a counterexample. Lfn^ is spurious then there exists i G N such that^x G 7(77^) 
reachable from the states in I, i.e., 7(7Tg) D J, it holds that: 

p{post[R]{p{{x}))) yf p{post[R]{{x})). 

In particular, by Theorem 4, if tt** is spurious then there exists tt G 7(7 t**) and 
X G n such that p{post[R]{p{{x}))) yf p{post[R]{{x})) , or equivalently, by Propo- 
sition 4, there exists x G tt such that p{pre[R]{p{{x}))) ^ pre[R]{p{{x})) . 

The complete shell abstract domain refinement can therefore be used to sys- 
tematically refine abstract model-checking in order to eliminate spurious coun- 
terexamples. In particular, the machinery developed in 0 to eliminate spurious 
counterexamples in abstract Kripke structures can be formally reduced to the 
complete shell refinement operation introduced in m The following example, 
taken from | 2 |, shows this phenomenon. 

Example 3. Consider a program with a variable whose values range in the fol- 
lowing interval D = [1, 12] and which is represented by a transition system with 
states E = { 1, • • • ,12}, initial states L = {1,2,3,7,8,10,11} and transition 
relation: 

r (1, 4), (2, 5), (3, 6), (4, 9), (5, 9), (6, 6), (7, 12), 1 
\ (8, 8), (9, 9), (10, 10), (11, 11), (12, 12) / 

Let S'** be the set of four abstract states S** = { 1, 2, 3,4 }, corresponding to the 
equivalence classes of concrete states {1, 2, 3}, {4, 5, 6}, {7, 8, 9}, and {10, 11, 12}. 
Suppose that a model-checker gives an abstract counterexample tt** = 1, 2, 3, 4. It 
is easy to see that tt** is spurious, in fact, this abstract trace does not correspond 
to any concrete trace in the transition system. By Theorem 4 we can note that 
7(3) = { 7, 8, 9 }, moreover only the concrete state 9 of this set is reachable from 
initial states in /, but it does not have any transition into a different concrete 
state. Therefore 9 does not have a transition to a state which is abstracted by 4. 

Note that, if there exists tt G 7(71**) and x G tt such that p(post[Ri\{{x})) yf 
p{post[Ri\{p{{x}))) then this may not imply that tt** is spurious. Consider for 
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instance the abstract path = 1,2,3 of Example 3. It is easy to check that 
p{post[R]{p{[6}))) = [4,9], whereas p(post[i?]({ 6 })) = [4,6], but ■k'^ corre- 
sponds to the concrete paths 1,4,9 and 1,5,9. Therefore, the complete shell 
refinement may induce a too strong refinement in the domain with respect to 
what is strictly necessary to remove spurious counterexamples, as shown in the 
following example. This phenomenon can be controlled by checking, at each 
iterate in the g/p-computation of the complete shell, the presence of spurious 
counterexamples . 

Example 4- Consider the transition system in Example 3. Let (p([l, 12]), a, 7) 
be the corresponding GI, with A = p({[l, 3], [4, 6], [7, 9], [10, 12]}). By Theorem 1 
and Proposition 4, the 'B-complete shell gfp{\X. 70a □ obtained 

as the limit of the sequence of abstract domains defined as follows: 

.i _ f A if i = 0 

} M(pre[i?](A*“^) U A) otherwise 

This sequence of abstract domains converges in three steps to the fix-point. Each 
abstract domain A* induces a refined partition according to Proposition 2 in the 
following way: 



Domains 


Refined partition 


A° - 


{[1,3], [4, 6], [7, 9], [10, 12]} 


A^ - 


{[1,3], [4, 5], [6, 6], [7, 7], [8, 9], [10, 12]} 


A^ - 


{[1, 2], [3, 3], [4, 5], [6, 6], [7, 7], [8, 9], [10, 12]} (fix-point) 



Consider the partition induced by A^ . The new abstract states {4,5} and {7} 
are introduced by computing the maximal inverse image of post[R\ in the states 
{10, 11, 12} and {7, 8, 9}, whereas the states {6} and {8, 9} are added to obtain 
a partition of p([l,12]). Now consider the partition induced by A^. The new 
abstract states {3} and {1,2} are introduced by computing respectively the 
inverse image of post[R] in the states {6} and {4,5} of A^. 

It is worth noting that the partition induced by A^ is sufficient to remove any 
spurious counterexample. Therefore the refinement process can be stopped after 
the first iteration. This condition can be verified by checking counterexamples 
at each iterates in the fix-point computation. 



7 Complete Data-Flow Analysis: A Different Perspective 

In this section we sketch, by means of a simple example, how we can design 
sound and complete data-flow analysis algorithms as complete model-checking 
of abstract interpretations. Once again, completeness is achieved by removing 
spurious counterexamples. 

The idea of data-flow analysis as model-checking goes back to Steffen’s pio- 
neering work m- More recently Schmidt |^ni?Ti showed that data-flow analysis 
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problems can be formulated as model-checking problems of a trace-based abstract 
interpretation of operational semantics. Intuitively, the concrete semantics of a 
program is a tree whose paths represent execution traces and whose nodes dis- 
play the program’s changing states. The abstract interpretation of the semantics 
of a program is also a computation tree, but whose nodes contain the abstrac- 
tion of information contained in concrete states. To be useful, a program abstract 
computation tree must safely simulate the concrete computation tree that it rep- 
resents. Thus any property that holds in an abstract tree must also hold for all 
the corresponding concrete trees. However, the dual notion of completeness does 
not hold in general, namely the properties that do not hold in the abstract may 
not be counterexamples for the concrete. 




Concrete semantics transitions 
Val=Int 

0 l- 2 ; = 0^0l-a;:= succ{x) 

Oh® — succ{x) ^ 1 h X = 0 
-b(n-|-l)l-x = 0— > -\-{n -I- 1) b X > 0 
-|-(n,-|-l)l“X>0— > -\-{n -I- 1) b X := — x 
-I- (n -I- 1) b X := —X ^ — (n -|- 1) b x > 0 

— (n-|-l)bx = 0^ — (n -I- 1) b X > 0 

— (n-|-l)bx>0^ — (n -I- 1) b X := 0 

— (n-|-l)bx:=0^0bx = 0 



Fig. 3. Flowchart and Concrete Interpretation. 



Let us consider the flowchart program depicted in Fig. 3. Its concrete seman- 
tics is expressed as a set of transition rules between program states. A program 
state is an element of Val x ProgramPoint and is written as n b p, where n is 
the value of the variable x at program point p. Note that in this example the 
concrete computation tree is an infinite execution trace. 

The fundamental idea in Schmidt’s trace-based abstract interpretation [r3l)l'3 1 j is 
to replace an execution trace of the program by transition rules that use abstract 
values instead of run-time values. The result is an abstract computation tree, 
where multiple possible executions traces are presented by a non-deterministic 
branching in the abstract tree. 

In the example above we consider a program analysis which detects whether 
X is zero or not. This corresponds to replace the concrete domain Val = Int hy 
AbsVal = {0, yf 0} and by rewriting the state transition rules accordingly. More 
in detail, we consider a function (3 \ Val ^ AbsVal, which maps concrete values 
to their most precise abstractions. For the example, in Fig. 3, we have /3(0) = 0 
and /3(±(n -b 1)) = yf 0. Moreover, the following homomorphism property is 
necessary in order to let the abstract transitions correctly simulate the concrete 
ones: if (r; b p) — > (v' b p') is a concrete transition, then there must exist an 
abstract transition, {(3{v) b p) — > (a' b p'), such that !3{v') < a' , where < is the 
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approximation order between abstract values. The resulting abstract semantics 
transitions are shown in Fig. 4: 



Abstract semEuitics transitions 
0h® = 0^0l-®:= succ{x) 
01 - 3 ;:= succ{x) 0 h a; = 0 
7f0l-a; = 0^5^0l-a:>0 
/ 0 h a; > 0 0 h a; := —x 

7f0l-a;>0^7f0l-a;:=0 
/ 0 h a; := —x 0 h a; > 0 
/Oha; — 0^0ha; = 0 



Refined Abstract semantics transitions 
0ha; = 0^0l-a;:= succ(x) 

0 h X := succ(x) ^ + h X = 0 
+hx=0^+hx>0 
+ ha;>0^ + l-a;:= —x 
+ h a; := —x ^ — h x > 0 

— ha; = 0^ — ha;>0 

— ha;>0^ — ha;:=0 

— ha;:=0^0l-a;:=0 



Fig. 4. Abstract Interpretation of Flowchart. 



Consider the safety CTL formula (p = VGVF(a; = 0) specifying that for each 
program state in any trace it is possible to reach a program state where the value 
of the variable a; zs 0. It is easy to check that the abstract tree does not satisfy 
this property because there exists an infinite trace yf 0 F a; = 0 0 h a: > 

0 ^yf 0 h a; := —x ^yf 0 h a; > 0 — > ... that invalidates the specification. In 
this case, the abstraction is too rough because this is an infinite abstract trace 
that does not correspond to any concrete trace, i.e., we have found a spurious 
counterexample for tp. 

In order to find a sound and complete trace based abstract interpretation, 
we have to refine the abstraction. By applying the backward complete shell 
refinement for post[R], with R being the transition relation specified above, we 
obtain a refined model which includes the abstract values {0, +, — , yf 0}. In this 
case, the abstraction function is such that /3(0) = 0, /3(+(n +1)) = + and 
/3(— (n+I)) = — . The resulting refined abstract semantics transitions, according 
to Definition 4, are shown in Fig. 4. It is easy to check that the refined abstract 
tree now satisfies the property tp. 



8 Conclusions 

In this work we have studied the impact of the notions of forward and backward 
completeness in abstract interpretation based model-checking. Our results allows 
us to apply standard abstract interpretation theory to refine abstract models 
when they result too rough for verifying temporal properties of interest. We 
have proved that the precision of the abstract model-checking problem for a 
Kripke structure M = {E, I, R, || o ||) is strictly connected to the precision of the 
underlying abstract domain, and that spurious counterexamples can be removed 
by considering the complete shell of the abstract domain with respect to the 
post[R\ predicate transformer. 

It is particularly interesting to study what kind of counterexamples that 
can be removed when the refinement is computed with respect to an arbitrary 
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function /, possibly different from post[R], In this case our results allows us to 
formally derive the structure of those counterexamples that can be removed by 
an arbitrary domain refinement operation, provided that this operation can be 
encoded as a complete shell, namely it corresponds to the complete shell re- 
finement with respect to a given function /. It is easy to see that if is 
an abstract Kripke structure associated with an abstract domain A and a con- 
crete transition system {S,R), the refinement operation returns a refined 
Kripke structure M®/ according to Proposition 2 which, for a given formula 
ip, removes all the counterexamples that corresponds to the traces in a modified 
(simplified) transition system {E, R A Rf). This new transition system includes 
only those transitions that satisfy both the original relation R and the rela- 
tion associated with /. The consequences of this observation in static program 
analysis by abstract model checking are under investigation. 
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Abstract. Language-based security leverages program analysis and pro- 
gram rewriting in enforcing security policies. The approach promises effi- 
cient enforcement of fine-grained access-control policies, and it seems to 
require a trusted computing base of only modest size. This talk discusses 
progress and prospects for the area. Traditional security problems viewed 
through the lens of programming language research invites novel uses of 
various well understood results from the area. It also provides reason to 
revisit assumptions and research directions that have been driving forces 
in languages research. 
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Abstract. Distributed message-passing based asynchronous systems are 
becoming increasingly important. Such systems are notoriously hard to 
design and test. A promising approach to help programmers design such 
programs is to provide a behavioral type system that checks for behav- 
ioral properties such as deadlock freedom using a combination of type 
inference and model checking. The fundamental challenge in making a 
behavioral type system work for realistic concurrent programs is state 
explosion. This paper develops the theory to design a behavioral module 
system that permits decomposing the type checking problem, saving ex- 
ponential cost in the analysis. Unlike module systems for sequential pro- 
gramming languages, a behavioral specification for a module typically 
assumes that the module operates in an appropriate concurrent context. 
We identify assume-guarantee reasoning as a fundamental principle in 
designing such a module system. 

Concretely, we propose a behavioral module system for 7r-calculus pro- 
grams. Types are CCS processes that correctly approximate the behavior 
of programs, and by applying model checking techniques to process types 
one can check many interesting program properties, including deadlock- 
freedom and communication progress. We show that modularity can be 
achieved in our type system by applying circular assume-guarantee rea- 
soning principles whose soundness requires an induction over time. We 
state and prove an assume-guarantee rule for CCS. Our module system 
integrates this assume-guarantee rule into our behavioral type system. 



1 Introduction 

Several computing systems are built today in a distributed wide-area setting, 
using an asynchronous message-passing programming model. These programs 
are notoriously hard to design and test, due to inherent difficulties in dealing 
with concurrency. Better programming languages and programming tools for 
building such programs are becoming increasingly important. 

In hardware and protocol design, there has been success in modeling differ- 
ent agents as communicating finite state machines, and using model checking 
to explore the interactions between the agents. However, agents in concurrent 
software tend to have more complicated communication structure than their 
counterparts in hardware. Indirect references and dynamic creation of new ob- 
jects play a prominent role in interactions between software agents. For instance, 
one agent can create a new object and send the object’s reference to a second 
agent. Following this, both agents can read or change the object’s contents. Such 
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interactions are typically hard to model using communicating finite state ma- 
chines. The TT-calculus provides a simple way to model such interactions. The 
combination of fresh name generation and channel passing allows faithful mod- 
eling of several complicated communication patterns between software agents. 
However, in spite of its simple semantics, it is hard to automatically analyze tt- 
calculus programs for checking behavioral properties. Recently, there has been 
considerable interest in designing so called behavioral type systems for statically 
checking important behavioral properties such as deadlock freedom and com- 
munication progress for 7r-calculus programs. Behavioral type systems use type 
inference to extract behavioral abstractions of the program, called behavioral 
types, and use model checking to explore the state space of these behavioral 
types. The fundamental obstacle in making a behavioral type system scale is the 
exponential state space explosion in model checking. The only hope for dealing 
with state explosion on realistic programs is to partition the type checking and 
model checking problems to operate on pieces of the program, thereby saving 
exponential amount of analysis time. We develop the theory required to design 
a behavioral module system, which makes such partitioning possible. 

Our work is inspired by the behavioral type systems proposed by Igarashi 
and Kobayashi Here, types are CCS-like processes that correctly approxi- 
mate the behavior of 7r-calculus programs, and types are inferred from programs. 
A model checker is used as a subroutine inside the typechecker for checking in- 
teresting program properties, including deadlock-freedom and communication 
progress. In this paper, we propose to incorporate assume- guarantee reasoning 
to enable modular type checking in such a system. Assume-guarantee 
reasoning allows the programmer to state behavioral abstractions of a module 
that hold only in contexts where the module will actually be used. 

Since our types are CCS processes, we need an assume-guarantee rule that 
works for CCS. All known assume-guarantee rules require the process calculus to 
have a nonblocking semantics. Since CCS processes can block, previous assume- 
guarantee results are not directly applicable. This paper has three technical 
contributions: 

— We state and prove an assume-guarantee rule for CCS. 

— We propose a behavioral type system for 7r-calculus in which types are CCS 
processes. Our type system is a variant of Igarashi and Kobayashi’s type 
system, and it includes name restriction in the process types. 

— We show that name restriction in CCS allows for a natural integration of 
our assume-guarantee rule into the behavioral type system. 

There are significant technical hurdles in designing a behavioral module sys- 
tem for a concurrent programming language. Module systems for sequential pro- 
gramming languages such as ML allow the user to specify abstractions of modules 
using type signatures. Module systems tactfully combine analysis and user anno- 
tation to partition type checking. A type signature of a module in ML is typically 
independent of the sequential context where the module is used. However, it is 
often impossible to state useful behavioral abstractions of a module that hold in 
all concurrent contexts. This phenomenon is well known in the specification and 
verification of reactive systems Thus, we need to allow the user to state 
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S = fj,a.{x?.m\.a?.a+ a?. (ml)*) 

Us = fi'y.{x\.'y) 

Sender = i'x.{S \ Us) 

Sender = jj.a.{m\.a? .a) 

R — fip.(m7.{yl.al.p + m?.(a!)*)) 

Ur = fiS.iyl.S) 

Receiver = uy.(R \ Ur) 

Receiver = n(3.(m?.a\.jd) 

System = um, a. (Sender \ Receiver) 



Fig. 1. A Sender and Receiver in CCS. 



behavioral abstractions of a module that hold only in contexts where the mod- 
ule will actually be used. The resulting module system needs to reason about 
a module using behavioral abstractions of its environments. For instance, if we 
have two concurrent modules P and Q with behavioral specifications P' and Q' , 
then we assume Q' as the environment for establishing that P' is a correct ab- 
straction of P. Similarly, we assume P' as the environment for establishing that 
Q' is a correct abstraction of Q. Since behavioral abstractions are used circularly 
to reason about each other, the soundness of the reasoning needs to be estab- 
lished. Such circular proof rules are called assume-guarantee(A-G) rules, and 
their soundness requires an induction over time. We identify assume-guarantee 
reasoning as a fundamental principle in designing a behavioral module system. 

The remainder of the paper is organized as follows. Section 2 contains two 
examples illustrating various aspects of our module system. In Section 3 we 
state and prove an assume-guarantee rule for CCS. In Section 4 we propose a 
behavioral module system for 7r-calculus. In Section 5 we discuss related work, 
and Section 6 concludes the paper. 

2 Examples 

We show two examples, one to illustrate the assume-guarantee rule and one to 
illustrate our type system. We follow the syntax for CCS and 7r-calculus from . 

Figure H shows a Sender process sending messages to a Receiver process. 
The Sender and Receiver communicate through a message channel m and an 
acknowledgement channel a, that are known apriori to both processes. Sender 
comprises of process S that does the actual communication, and a local user 
process Ug which is consulted before every message transmission. Receiver com- 
prises of process R that does the actual communication, and a local user process 
Ur which is consulted after every message reception. Suppose we want to check 
a safety property of System such as deadlock freedom, specified by a temporal- 
logic formula ip- One way to do this is to explore the state space of System 
using a model checker. In order to alleviate state explosion, it is useful to write 
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abstractions of the components of the system, and run the model checker on 
each component separately. If the user writes abstract specifications Sender and 
Receiver for the sender and receiver respectively, one could attempt using the 
following compositional proof rule to avoid exploring the state space of System. 



Sender C Sender 
Receiver C Receiver 
{qm, a){Sender \ Receiver) \= 4> 
( 77 m, a){Sender \ Receiver) \= 4> 



[COMP] 



The restriction operator q in (qm, a) (Sender \ Receiver) prevents the environ- 
ment from interacting with Sender and Receiver through the channels m and 
a. For present purposes, it can be taken to be the same as the name restriction 
operator of for CCs| 

The obligation Sender C Sender requires that every behavior of Sender is a 
possible behavior of Sender. Note that the interaction between the component 
processes S and Us has been abstracted away in the specification Sender. Thus, 
the state space of Sender is smaller than that of Sender. However, the problem 
with [COMP] is that, in fact. Sender 2 Sender., since Sender can be send- 
ing arbitrary messages if acknowledgements arrive at unexpected times, whereas 
Sender ignores spurious acknowledgements. Also, Receiver ^ Receiver for sim- 
ilar reasons. Since these obligations do not hold, the rule [COMP] cannot be 
used to prove that System does not deadlock. 

The abstract process Sender is a correct abstraction of Sender only in an 
appropriate environment. Similarly, abstract process Receiver is a correct ab- 
straction of Receiver only in an appropriate environment. The assume-guarantee 
proof rule shown below, allows the Sender and Receiver to be analyzed in com- 
position with their abstract environments: 



(qm, a) (Sender \ Receiver) C Sender 
(qm, a) (Sender \ Receiver) C Receiver 
(qm, a)(Sender \ Receiver) ^ 7/> 

(qm, a)(Sender \ Receiver) ^ ij) 

Note that the obligations of the [AG] rule require 
with Sender only in the environment provided by Receiver. Similarly Receiver is 
required to conform with Receiver only in the environment provided by Sender. 
Thus, a model checker can discharge the obligations of the [AG] rule and prove 
deadlock freedom of System without having to explore the entire state space 
of System directly. The soundness of such a proof rule requires certain side 
conditions expressing progress, and is established using an induction over time. 
In Section J we state such side conditions and prove this rule for CCS. 

Now suppose that the Sender is part of a vendor on the world wide web and 
the Receiver is part of a customer. A common situation is the customer first goes 
to the vendor’s website and after authentication gets fresh channels (these could 

The reason we use the notation q is technical and will be explained later. 



[AG] 

the Sender to conform 



1 
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S{m, a) = fia.{x?.m\.a?.a+ a?.(m!)*) 
Us = ^i'y.{x\.-y) 

Sender{m, a) — i/x.(S(m, a) | Us) 



R(m, a) = /i/3.(m?.(y!.a!./3 + m?.(a!)*)) 
Ur = fx3.(y?.S) 

Receiver{m, a) = vy.{R{m, a) | Ur) 



Ts(m,a) = fj,a.{x?.m\.a?.a+ a?.(m!)*) 
TUs = f^y-{x'--'y) 

'^Sender(m,a) '{,'^S (m,a) | '^Ua) 

'^Sender(^m,a) . ( 771 ! . U? .o) 

TR(m,a) = (J.I3.{m?.{y\.a\.f3+ m?.{a\)*)) 
TUr = fJ'S-{yT-5) 

XReceiver(m,a) ^y •{,'^R{m,a) | X{J^) 



Vendor{m, a) = www\[m, a].Sender{m, a) 
Customer = wwwt[m, a].Receiver{m, a) 
System 

= vwww .{{vmsg, ack. Vendor(msg, ack)) 
I Customer) 



rReceiver(m,a) ^/3. ( 772?. U ! ./3) 



Vendor (m, a) WWW\[{m, <x)Tjieceiver(m,a)]- 

i_'^Sender( m,a) I '^Receiver{m,a)^ 
'^Customer WW'W‘1 ^ ^')'^Receiver(^m,a)\‘ 
'^Receiver{m,a) T 



Fig. 2. An Sender-Reciver System in 7r-Calculus and Its Process Types. 



be fresh URLs) over which the transaction actually happens. Such an interaction 
can be modeled using the channel name generation and channel passing capa- 
bilities of the TT-calculus. Figure Hshows a model of the above scenario in the 
TT-calculus. If we want to check that the vendor process Vendor and customer 
process Customer do not deadlock, we need to be able to handle channel passing 
in our analysis. A promising approach is to first use a type-system to construct 
first-order approximations of the processes called process- types, and use model 
checking on the process- types. In Section J we build a type-system inspired by 
the work of Igarashi and Kobayashi to abstract 7r-calculus processes using CCS 
processes as process types. The right side of FigureHshows the process types 
generated by the type-system for each 7r-calculus process on the left. A model 
checker is used as a subroutine inside the type-checker. In our type system, it 
turns out that the model checker is asked to check: 

{rjmsg^ Qck)Vgender{msg,ack) \ r rteceiver{msg ,ack)) 

Here Tsender{msg,ack) IS the process type for the Sender{msg, ack) process and 
TReceiver{msg ,ack) Is the process type for the Receiver{msg, ack) process. These 
process types are identical to the Sender and Receiver processes from Figure^ 
The notation z\[t].P is used for the type of a process which sends a channel 
along z and continues as P, where r is a type describing the interactions that 
could possibly happen on the sent channel. If the user writes behavioral type 
specifications fsender{msg,ack) and TReceiver{msg,ack) at the module interfaces for 
Sender {ms g, ack) and Receiver {msg , ack) we can use our assume-guarantee rule 
to mitigate the state-explosion that happens inside the type-checker. 

Figure B shows a staged-server system with two stages. This example 
was inspired from a web-crawler example in StageA receives inputs from 
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Sem = nj3. {acquire\. release!. P) 

StageA = A![x].aequire? .{i/y)B?[y].y? ,x\ 
StageB = ya.{B![y\.y\.release\.Q.) 



TSem = yf3.{aequire\. release!. P) 

'^StageA 



= A![{x)x\]. acquire!. {tyy)B![{y)y\].{y! \ y\) 
TstageB = ya.{B! [{y)y\].releasc\ .a) 



System = (*StageA) \ StageB*^ \ Sem*^ 




Fig. 3. A Staged-Server in 7r-Calculus and Its Process Types. 



the user and then passes each request to StageB. When StageB responds to the 
request, the response is first received by StageA and then passed on to the user. 
The system comprises of an unbounded number of copies of StageA and k copies 
of StageB. For the purpose of resource control, k copies of a semaphore process 
Sem are used to control access to the k copies of StageB. Name generation is 
used to model matching the requests with appropriate responses. With every 
request, StageA generates a new channel y, sends y to StageB, and waits for a 
response on channel y. The right hand side of Figurejshows the process types 
generated by our type system. The channel passing from StageA to StageB has 
been approximated in the process types. Note that StageA sends a new channel 
y to StageB. Upon receiving the channel y, StageB does yl. The type TstageA 
does not send any channels to StageB. The effect of StageB doing yl is statically 
transferred inside the description of TstageA by the type system. 

The process System satisfies the following property: whenever StageA wants 
to send a message to StageB after successfully acquiring the semaphore (by 
executing aequire!), then the send B\[y] never blocks. Even though Tsystem is an 
infinite state system, due to unbounded number copies of TstageA, we can check 
this property on Tsystem by using a model checker with counting abstraction 
similar to Q. 

3 Assume-Guarantee Rule for CCS 

In this section we give syntax and semantics of CCS processes, and we define 
trace containment for such processes. The main result of the section is Theo- 
rem J which gives an assume-guarantee rule for CCS. 



3.1 Syntax and Semantics of CCS Processes 



The syntax of CCS processes P is given by the following definition. 



P::=a \ 0 | SiGi | P|Q | ya.P \ {vx)P 



G-.:=x\\P I xP.P 
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The structural preorder ^ is the least reflexive and transitive relation closed under 
the following rules, together with renaming and reordering of bound variables and 
reordering of terms in a summation. The notation P = Q abbreviates P ^ Q and 
Q ^ P. The set of free names of P is denoted fn(P). 

P|0 = P P\Q = Q\P P\{Q\R) = {P\Q)\R 



(i/x)0 = 0 fj,a.P = P[fia.P/a] 



x^fn(P) P<P' Q<Q' 

P I {vx)Q < {vx){P I Q) P\Q<P'\Q' 

Figure^^ Structural Preorder 



P P [eps] 



{... + x'dPP + xt\Q +...) P I Q [react] 



(. . . -I- a;!‘.P -I- . . . ) ^ P [O-comm] 



{... + x?\P + ...) ^ P [I-comm] 



P p' 



(vx)P ^ — > {ux)P' 



[tau] 



P P' x^£ 
{vx)P — ^ {ux)P' 



[res] 



p X p' p' -Uq' Q' 



P^Q 



[S-cong] 



P P' 



P\Q^P'\Q 



[par] 



Eta rules 



p ^ pf 



{rjx)P (rjx)P' 



r 11 P^P' x^t , 

[etaI] — [eta2] 



{r]x)P — s- (rix)P' 

In the rules above, i ranges over actions of the form a;!*, a;?*, , a;?*^’*^ or . 



FfgureO Labeled Reduction on CCS Processes 
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We write *P as an abbreviation for fia.{P \ a), P* as an abbreviation for 
^a.(P.a), and P^ as an abbreviation for k copies of process P in parallel. 
Throughout this section, P,Q, P' and Q' range over CCS processes. 

We augment the usual syntax of CCS with tags on send and receive oper- 
ations. Actions in CCS are of the form a;!*, a;?*, or e. The 

actions x!* and x?* denote commitments and actions denote re- 

actions. Note that commitments have a single tag and reactions have two tags. 
The action denotes the invisible or silent reaction, and action e denotes 

the null action. 

FigureHdefines the labeled reduction relation on processes. As indicated by 
rule S-CONG in Figure^ reduction is modulo structural congruence, defined in 
Figure^ Note that in addition to the usual rules for the restriction operator v 
we have rules etaI and eta2 for the restriction operator rj. This operator is the 
same as v, only with different observability properties: The expression (rfx)P is 
simply meta-notation for a i/-abstraction whose interactions can be observed. 
This notation is needed to state our assume-guarantee rule. 

Sometimes it is insignificant if an action is a send or receive. In such cases, 
we drop the ? and ! symbol from the action for brevity. Let Act be the set of 
all actions of the form x*, or e. We use w, ... to range over 

finite sequences of actions, and we write oj[i\ to denote the i’th element of oj. If 
P is a CCS process with 



P 



•^[ 0 ] 



Pi 



qi] 



P2 






■■■Pn-1 



a-l] 



Pr, 



then uj = o;[o]ixi[i](Xi[2] . . is a trace of P. In such cases, we lift reductions to 

sequences of actions and write P Pn- The set of all traces of P is denoted 
Tr(P). We let Act(w) denote the set of actions occurring in the trace w, and we 
define for a process P the set of actions Act(P) = Uo;GTr(p) Act(w). 

We will assume that, for any set of processes under consideration, tags are 
used only once, he., no tag occurs twice in the processes. We let T(P) denote 
the set of tags occurring in P. 

Let Lu = W[o]W[i] . . .W[„] be a trace in Tr((?7x)(P | Q)), ix>[i] G Act. For an 
element in u> we will now define the projection ofuj[i] onto P, denoted (w[i])p, 
as follows. The definition is by cases over the form of G Act: 



(x*i-*=)p = 

(r*i-*^)p = 

(x*)p = 

(e)p = 



X *1 




if G T (P) and t2 


gT(P) 


X *1 




if P G T (P) and t2 


^T(P) 




if ti ^ T (P) and t2 


gT(P) 


e 




if ^ T (P) and t2 


^T(P) 


rti 


.*2 


lit gP (P) and t2 G 


;T(P) 


e 




if h ^ T(P) or t2 ^ 


T(P) 




if 


t G T(P) 




e 


if 


t ^ T(P) 





The projection (w[i])Q is defined analogously. If w = o;[o]ixi[i](Xi[2] . . is a 

trace of P | Q, then we define the projection of uj onto P, denoted up, to be 
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given by 



(wp)[i] = (w[i])p for i = 0 . . .n - 1 

and the projection of uj onto Q, denoted ujq, is defined analogously. 

We need an operation to combine traces. Let 0 be the partial function on 
Actx Act, given by = a;*’* and ^0e = e©£ = (. for all (. € Act. For traces uji 

and u )2 of equal length we define ©W 2 by setting (tui ©u; 2 )[i] = (wi)[i] © (tU 2 )[i] • 
We consider traces modulo e, that is, any number of occurrences of e can be tacitly 
inserted or removed from a trace (hence any trace has some representative of any 
given length greater than some smallest length). Finally, if ir is a list of channel 
names, we define the relation 



xh£r^£' 

to hold if and only if ^ © .^' is well defined and both of the following conditions 
are satisfied for all a; G a;: 

— if is of the form a;*, then £' ^ e 

— if £' is of the form a;*, then £^ t 

We lift the relation to traces of equal length n, by defining s h ^ tt >2 to hold 
if and only if for alH = 0 . . . n — 1 we have x h (ct>i)[i] ^ {^ 2 )[i\ ■ 

Lemma 1. Assume that Act{u;) C Act{A) U Act{B). Then we have: 

to G Tr{A I B) if and only if u>a © wp is well defined and oja G Tr(A) and 

ujb G TifB) and oj = uja © wp . 



Lemma 2. Assume that Act{uj) C Act{A) U Act{B). Then we have: 

to G Tr((7fx)(A I B)) if and only ifx h loa ~ wp and oja G 77(A) and lob G Tr{B) 

and OJ = OJA © ojb ■ 

Note that LemmaHcoincides with LemmaHif ^ is empty. 



3.2 Trace Containment 



For a trace w, let oj'^ denote the trace that arises from to by eliding all r actions. 
Also, let 00 ° denote the sequence that arises from to by replacing all actions of the 
form or x*' with x. For a trace to, we define the norm of to, denoted N(o;), to 
be the sequence (tu’’)°. We write to =n to' as an abbreviation for N(o;) = N(tu'). 

We say that a process I is trace contained in process S with respect to process 
P, written / Cp S, if and only if / = {rjx){P \ Q) and for every to G Tr(I) there 
exists to' G Tr(S') such that to' =n top. We abbreviate I Cj S as I C S. 

Let a; be a channel name in x. We say that a; is a non-hlocking channel 
of process P in the process {r]x){P \ Q) if and only if whenever the following 
conditions hold: 



1^1 



1. P 



P' 
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3. x\- UJi ^ 

4. P' = {... + a^KP” + ...) 
then we have 



Q' + + ...) 

where r* is some sequence of r actions, a = xl and a = x\, or a = x\ and a = xl 
for some x. 

3.3 Assume-Guarantee Rule 

Given an implementation I and specification S, suppose we want to check if 
I S. Suppose further that I = {vx){P | Q) is a composition of two processes 
P and Q that interact over a set of channels x, and that the specification S = 
{vx){P' I Q') is structurally similar to the implementation I. Then Theorem J 
gives a way of checking if / C S' without exploring the entire state space of / 
directly. 

Theorem 1. (Assume-Guarantee) For any proeesses P, Q, P' , Q' suppose 

AT {px){P I Q') Cp P' 

A2. {px){P' \Q)CqQ' ^ 

A3. Every channel x in x is either non-blocking for P in {rjx){P \ Q') or non- 
blocking for Q in {rjx){P' \ Q). 

Then we have 



m{P I Q) C {rfx){P' I Q’) 

Before proving the theorem we state a few lemmas. In the following, we will 
sometimes use process superscripts on traces, as in . This is a naming con- 
vention intended as a help to remind the reader that is in Tr(P). 

Lemma 3. Ifx \~ u>i ^ u>2 cind u>i =n oj'i, then x\~ ui'i ~ W 2 . 

Lemma 4. If toi 0W2 is well defined and u>i =n then oj'i ®u>2 is well defined 
and uji (B 0J2 =N 0 0J2 ■ 

Lemma 5. Suppose that 

1. u e Tr{{rjx){P \ Q)) 

2. LO^' G Tr(P') with =m ujp 

3. (Tfx){P' I Q) Cq Q' 

Then there exists G TifQ') such that =a/ luq- 

For natural numbers fc, we can talk about trace containment up to k, de- 
noted Cp, by defining (qx){P \ Q) C|> P' if and only if for all traces uj G 
Tr{{'qx){P I Q)) of length at most fc, there exists oj' G Tr(P') with oj' =n ujp. 
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Lemma 6. Suppose, for any k, that 

1. {px){P I Q) P' 

2. {m){P I Q) Q' 

Then {r]x){P \ Q) {'t0){P' \ Q')- 



Lemma 7. Let uj G Tr[{rjx){P \ Q)) and let G Tr{{r]x){P \ Q')) such that 

1. {px){P I Q') Cp P' 

2. Up =i\i )p 

3. ujP\Q'.u G Tr[{T]x){P I g')) 

Then there exists G Tr{P') such that =a/ ujp.{Cj)p. 



Lemma 8. (Context Substitution) 
Assume 



{■qx){P I Q) ^ ivx){Pk I Qk) 

and G Tr{{rix){P \ Q')) with u =n and up = {u^^^ )p. Then 

{px){p I g') miPk I Q'k) 



for some Q'j^. 

We are now ready to prove Theorem J 

Proof. Assuming Al, A2 and A3 we prove by induction on the length of traces 
in Tr((? 7 ;r)(P | Q)) the stronger conclusion 

Bl. {rix){P I g) C {rix){P' \ Q') and 
B2. {px){P I g) Cp P' and 
B3. {px){P I g) Cq Q\ 

Let u = uqUi . . .Uk-iUk . . .o;„ be an arbitrary trace in Tr((? 7 a;)(P | g))- Let 
ua denote the prefix uqUi-.-Ui, for 0 < z < n. We assume the induction 
hypothesis holds for u<k and prove it for u<(^k+i)- 
We first establish the following: 

Cl. u<i G Tr((? 7 s)(P I g)) for all 0 < z < n. 

C2. 3up^ G Mim){P' I g)). u<k =N A (^<fc)Q = 

C3. 3u^^^' G Mim){P I Q')). u<k =N A (u<k)p = 
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Cl follows by the assumptions and the definition of uj<i. To prove C2, we first 
observe that, because uj<k G Tr{{r]x){P \ Q)), Lemmaflshows that (w<fc)p G 
Tr(P) and (w<fc)g G Tr(Q) and 

X h (w<fc)p ~ (w<fc)Q (1) 

By induction hypothesis (B2) applied to uj<k we get 

G Tr(P'). {uj<k)p =N (2) 

Now, choose ujj^ according to Q. By Q, Q and Lemmajwe have 

x\- UJk ^ {u!<k)Q (3) 

Define by setting 

‘^k ® (t^<fc)g 

Then is well defined and in Tr((?7a;)(P' | Q)) by B and LemmaH Because 
(by Q) we have ujj^ =n {uj<k)p, it follows from Lemmajthat 

tOk ® (w<fc)g =N (w<fc)p © (w<fc)Q 

which shows that 

wf =N UJ<k (4) 

P'\Q 

Furthermore, it follows from the definition of that 

(w<fc)g = (wf '‘^)q (5) 

It follows from ^ and Q that is a witness of C2. This concludes the 

proof of C2. The claim C3 is proven by a symmetric argument, using induction 
hypothesis (B3). 

We now proceed to prove Bl, B2 and B3 for the inductive case fc + 1 by a case 
analysis on the form of Uk+i - For space reasons we prove only one representative 
case. The full proof can be found in the technical report 

— Case 1. Wfc+i is an interaction for x £ x: WLOG let ti G T(P), 

t 2 G T(Q), and let x be non-blocking for P in {rjx){P \ Q'). 

We know from C3 that, for some G Pr{{rjx){P \ Q')) we have 

^<k =N and {oj<k)p = {oJk^^ )p (6) 

By our assumptions, ujk+i = so that P can make the commitment 

in step ujk+i - Hence, we have 



{r]x){P I Q) ^ {rix){Pk \ Qk) — > (?7a;)(Pfc+i | Qk+i) 
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cc^ 1 

with Pk — > Pk+i, for some Pk, Qk, Pk+i, Qk+i- By B together with 
LemmaH^e can conclude that 

{rjx){P I Q') m{Pk I Q'k) 

for some Q'^. Because x is non-blocking for P in {rjx){P \ Q'), it follows that 



Q'k 



Q" ^ Q'fc+i 



for some sequence of r-actions r* and some ta, Q” , Putting the previ- 

ous results together, we can conclude that 



(rfx){P I Q') ^ 
{ 7 Jx){Pk I Q'fc) ^ 

{rjwm I Q") 

{r]x){Pk+i I Qfc-i-i) 



Hence, we have 

eJr{m{P\Q')) (7) 

By D, D and assumption Al, LemmaHis applicable (taking tD = t* .x*^’*^), 
and we get that there exists uj^ G Tr(P') such that 

u}^ =N (w<fc)p-('T*.a;*^’*^)p =N (w<fc)p.a;*^ = {u}<k+i)p 

Hence, we have 



U}^ =N (w<fc-|-l)p (8) 

thereby showing B2 for the inductive step fc-l- 1, as witnessed by in 
Since u<k+i S Pr{{'qx){P \ Q)), it follows from ^ together with A2 via 
LemmaHthat there exists € Tr(Q') with (a;<fc+i)Q. This shows 

B3 for the inductive step fc -h 1. Lemmas applied to B2 and B3 for fc -|- 1 
then yields B1 for the step fc + 1. 

— Remaining cases. See technical report Q. 

□ 

A model checker can discharge obligations Al, A2 and A3 and reach the desired 
conclusion. Note that 

(m)(P I Q) C (r^x)(P' I Q') => {px){P I Q) C {px){P' I Q') 

Thus, rj is just a meta-process notation that lets us state the obligations Al, 
A2 and A3, all of which require observing the interactions on channels in x. 
We note that model checking CCS is undecidable in general. However model 
checking is decidable for certain fragments of CCS such as the finite control 
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In the reduction rules below, the structural preorder ^ is as defined in Figure | 
with the additional rule *P ^ *P \ P. 

Syntax 



P -.-O I EiGi \ P\Q \ *P \ (vx)P 
G-.-.= P}[%P I ®?*[y].P 



Semantics 

(. . . + a;!*[z].P + . . . ) I (. . . + x?* [y].Q + . . . ) - — > P | \z/y]Q [r-COm] 



P_U p' 

P\Q-^P'\Q 



[r-par] 



p -< p' p' 



Q' Q' El Q 
Q 



[r-sp-cong] 



p "" — > p' 
{vx)P [vx)P' 



[r-new1] 



p p' x^e 
{vx)P — ^ {vx)P' 



[r-new2] 



FfgureH Syntax and Semantics of tt Calculus 



fragment, which disallows ^ recursion inside parallel composition, and the ly 
free fragment. In order to model check an arbitrary CCS process one could first 
construct an abstraction that falls in such a decidable fragment and then model 
check the abstraction. 

Theorem J generalizes to the case of any finite parallel composition 
Since the proof method is the same as shown in the proof of Theorem J we 
confine ourselves to recording 

Theorem 2. (Assume- Guarantee) Let | . . . | P(_i \ Pi+i I ■ ■ ■ I Pn, for 

i = 1 . . . n. Then the following inference rule is sound 

Vz. {r^x){P, I g,) Cp, P' 



provided that the side- condition (*) is satisfied: 

{ Ex € X. Vz, j. 

either x non-blocking for Pi in {rfx){Pi \ 
or X non-blocking for Pj in (qx){Pj \ £,j) 



4 A Behavioral Module System for 7r-Calculus 

The syntax and semantics of the tt calculus is shown in Figure^ We use abstract 
processes P for types and type environments. A type judgment in this system is 
of the form P \> P, meaning that the abstract process P is a correct abstraction 
of the concrete tt calculus process P. In the sequel, we will refer to abstract 
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0 > 0 [T-Zero] 



/l > Pi p2 I> P 2 
Pi I P 2 I> Pi I P 2 



[T-Par] 



Pl>P 
*P l> *P 



[T-Rep] 



P > P P < P' 
P' > P 



[T-Sub] 



Figure^ Typing Rules 



7 i t> Gi for i = 1 . . . n 
yi T ■ ■ ■ yn t> G\ Gn 



[T-Choice] 



Pi >P 

ai!*[(y)P2].(Pi I [z/y]r2)>xm.P 



[T-Out] 



P>P 

*?*[(y)(Ptv-T)].(P tF)>®?ny]-P 



[T-In] 



Pl>P M^P(PTv-3t) 
(?7^)P Tv-3T 1= □V’ 
{vx)r t> {vx)p 



[T-New] 



processes F as process types. Throughout the remainder of this section, P, Pi 
and P' range over 7r-calculus processes. 

Our type system is a variant of the system presented by Igarashi and 
Kobayashi with the primary difference being that our process types are 
exactly CCS. The process types of Igarashi and Kobayashi form a subcalculus 
of CCS, because they do not include the name restriction operator, v. The in- 
clusion of name restriction enables us to type processes more precisely. Consider 
the following example process P = {ycd)P' where P' is the process 



(cl.d\ + \ 
V c!.d! ) 







(d?*7 



The type of P' in the Igarashi-Kobayashi type system is the process type F given 

by 



f c?.dl + \ , fti \ 

d.d! J ' t^.cT^ 



(d?*7 



In this type, the restriction {vx) . . . has been elided, and all occurrences of x are 
replaced by tags ti,t2, fa, with t.P' reducing to F'. This is an abstraction of name 
restriction, and it introduces an overapproximation of the concrete semantics. In 
this case, the type F contains an execution where the receive d?*** blocks for ever, 
which arises when reductions on t2 and ts are followed by a reaction between 0!*“* 
and c?*®. However, in process P all executions result in a successful interaction 
on channel d. In contrast, since our process types contain name restriction, the 
process type of P is identical to P in our system. 

The process types in our type system are defined by the following syntax: 

r (tuple types) ::= {xi, X2, ■ ■ ■ , Xn)P 
P (process types) ::= 0 | a | 71 | (Fi | F2) | p,a.P \ {vx)P 

7 ::= a:!*[r].F I a;?*[r].F 

This language is equivalent to CCS, with typed channels. Channel types do not 
influence reduction semantics of process types. However, they are used by the 
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type system to model higher-order message passing in tt calculus. The reduction 
semantics of process types is given by Figure^ For given process type F, each 
tag t uniquely determines an occurrence of either a;!*[r] or a;?*[r]. In the context 
of r let T{t) denote the type r thus associated with t. We abbreviate a;!*[0] 
as xl*. 

The typing rules of our type system are shown in Figure^ The type system 
incl udes subtyping in rule [T-Sub]. Our subtyping relation < is weak simulation 
19, defined as A < F 2 if and only if for all action sequences uj, whenever 

A A^ then there exists A and such that A F^ with uj =n oj' 
and F[ < F^. This definition of < satisfies the axioms for a proper subtyping 
relation as defined by Igarashi and Kobayashi The rule for name restriction 
[T-New] and the rule for input [T-In] use the anonymization operator F |s, 
where S' is a set of channel names. The formula □■(/> in rule [T-New] refers to 
an invariant, such as deadlock freedom. In order to define the anonymization 
operator, we first define the type elimination operator F\s, where S is a set of 
channel names, and it is defined by 



0\s = 0 

a\s = a 
(At[r].F)\s = 

(A*[r].F)\s = 



rc?*[r].(F\s) ifa;^S 
a;?*.(F\s) otherwise 

(a;!*[r].F)\s if a; ^ S 
a;!*.(F\ 5 ) otherwise 



(71 H 7n)\s = (7l\s) H ^ (7n\s) 

(71 I 7n)\s = (71 \s) I (7n\s) 
(7i&7„)\s = (7i\s)&(7«\s) 
{fia.F)\s = na.{F\s) 

{{vx)F)\s = {vx){F\s-w) 



Note that the type elimination operator leaves i^-bound names intact. For any set 
of channels S, let G{S) be the most general environment on channels S defined 
by: 



Q{S) = ^Qf.(^(a:! -I- x?)).a 
x^S 



The anonymization operator F "f s is 



defined as: 



I 



r ts= (i-s),(r\s I s(s» 



The rule [T-New] uses the predicate WF(F) which is defined to hold if and only 
if for all traces u> = wqWi . . . G Tr(iyV.F), for all 0 < z < n, if u>i = then 
T(F) > T(t2). 

The rule [T-New] is parameterized on the formula □■0. The formula ex- 
presses a safety property on the channels x. For example, it could express the 

^ This definition is equivalent to the definition provided by Igarashi and Kobayashi. 
Our definition allows refining G(S) using flow computation, in order to improve 
precision. 
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invariant about deadlock freedom on channels in x. Discharging this assumption 
will require CCS-model checking of the process type {r]x)r tv-s- 

We now state a subject reduction theorem, similar to the one in A proof 
of the theorem can be found in the technical report 

Theorem 3 (Subject Reduction). If F\>P and P — ^ P' with WF{r), then 
there exists F' such that F — ^ P' and F' \> P' . 

Type checking in our type system requires a model checking step to discharge 
the assumption 



(?7a;)P tv-T h (9) 

in the rule [T-New]. Since our types are CCS processes, TheoremHapplies. We 
can therefore alleviate the state space explosion in the model checker using user 
specified types. More precisely, Theorem^thows that the following inference rule 
is sound for discharging the assumption Q when P is a composition P = Pi | P2: 

min I n) h 

miri I n) m n 

min\r2)Frm2 

(r,x)(Pi I P2) h ^ ^ 

provided that every channel a; in ir is either non-blocking for Pi in {•qx){Fi \ P2) 
or non-blocking for P2 in {rjx){F[ \ P2). 

The behavioral types P( and P2 are user-specified types, analogous to user- 
specified type signatures in a type system such as ML. In order to apply the 
[AG] rule, a model checker is needed to discharge the assumptions of the rule 
and its side-condition. However, the types P{ and n typically more abstract 
than Pi and P2 and consequently, the state spaces of P{ and P2 could be much 
smaller than the state spaces of Pi and P2. Thus, using the [AG] rule helps us 
avoid exploring the state-space of Pi j P2, thereby alleviating state explosion. 
Indeed, if the program subjected to type checking is well modularized, this may 
save an exponential amount of work. 



5 Related Work 



Several behavioral type systems have been proposed recently in which types 
are process-like structures, including Also, other analyses have 

been proposed to check behavioral properties of concurrent programs, including 




Our work was foremostly inspired by the generic type system of Igarashi and 
Kobayashi While Igarashi and Kobayashi use a v-bee fragment of CCS 
for their process types, our type system uses the entire CCS. In particular, the 
presence of hiding in the form of name restriction in the process types improves 
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precision, and opens up several opportunities for modular type checking by ex- 
ploiting hiding. We use an assume-guarantee principle to discharge the safety 
check at name restriction (rule [T-New]) in a modular way. Using this tech- 
nique, we can exploit abstract behavioral specifications. The use of this principle 
in the context of behavioral type systems appears to be new. 

Model checking CCS processes is undecidable in general. However, decidable 
fragments of CCS have been identified by either disallowing parallel composition 
under recursion or by disallowing name restriction Q. Tools have been built to 
perform bisimulation checking, refinement and model checking of such decidable 
fragments of CCS Q. 

Assume-guarantee rules that allow apparently circular assumptions about 
operating contexts can be traced back to Recent work has used such 

techniques to model check large hardware circuits However, all these 

rules require a nonblocking assumption on the process calculus, and are not 
directly applicable for model checking CCS. Our assume-guarantee rule for CCS 
requires progress as a side condition that needs to be checked using the model 
checker, and, to the best of our knowledge, our soundness result for assume- 
guarantee reasoning in CCS is new. 



6 Conclusion 

Checking behavioral properties of concurrent, message-passing programs is an 
important and difficult problem in todays distributed programming environment. 
The major obstacle for doing this in practice is the state-explosion problem 
inherent in model checking. Previous work in model checking strongly suggests 
that solving this problem requires abstraction and modular methods, so that 
one can check a large system by checking parts of the system and combine the 
results. 

In sequential languages such as ML, module systems have proven to be very 
successful for providing both abstraction and modularity. However, the sequen- 
tial notion of a module system cannot be directly applied to checking behavioral 
types of concurrent programs, because it is much harder to define what a module 
boundary means in concurrent contexts. In particular, few interesting properties 
of concurrent programs are satisfied independent of their intended context of use. 
Hence, it appears that new principles of modularity are needed for behavioral 
type systems. 

In this paper, we have proposed that assume-guarantee reasoning is a key 
principle for modular behavioral type checking. Using the assume-guarantee 
principle, the behavior of a module is precisely guaranteed only under assump- 
tions about its concurrent context. If we want to combine two modules, they 
can be checked under apparently circular assumptions on each others behavioral 
“signatures” (circularity is resolved by an induction over time) . If programs are 
well modularized, this principle can lead to an exponential speed up of type 
checking. Furthermore, this principle suggests ways in which users can provide 
abstract behavioral specifications at module boundaries. 
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CCS processes have been proposed as behavioral types for tt calculus pro- 
grams. In order to enable assume-guarantee reasoning for a general class of be- 
havioral type systems, we have proven the assume-guarantee principle sound for 
CCS with respect to trace containment. To the best of our knowledge, this result 
is new for CCS. Prior assume-guarantee results require non-blocking semantics 
on the process calculus and hence cannot be directly applied to CCS. We have 
shown how this result can be integrated into a particular type system for the tt 
calculus, thereby enabling modular behavioral type checking for message-passing 
programs. Hiding in the form of name restriction permits writing modular pro- 
grams in the tt calculus. Our type system exploits hiding to decompose the type 
checking problem. 

Much work remains to be done to apply our results to a realistic program- 
ming language. In addition to handling the variety of constructs in a realistic 
language, we need to provide a natural way for the programmer to write behav- 
ioral specifications. A realistic system will require a combination of automation 
(type inference and model checking) and user-annotations (behavioral specifica- 
tions), and it must allow important programming idioms to type. 
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Abstract. We introduce an abstract interpretation framework for Mo- 
bile Ambients, based on a new fixed-point semantics. Then, we derive 
within this setting two analyses computing a safe approximation of a 
property about the run-time topological structure of processes which is 
relevant to security. 



1 Introduction 



Mobile Ambients (MA) has recently emerged as a core programming language 
for the Web, and at the same time as a model for reasoning about properties 
of mobile processes. MA is based on the notion of ambient. An ambient is a 
bounded place, where multi-threaded computation takes place; roughly speaking, 
it generalizes both the idea of agent and the idea of location. Each ambient has 
a name, a collection of local processes and a collection of subambients. Ambients 
are organized in a tree, which can be dynamically modified, according to three 
basic capabilities: inn allows an ambient to enter into an ambient n (m[inn. Pi \ 
P2] I n[Q] I— > n[m[Pi | P2] | Q])', outn allows an ambient to exit from an ambient 
n (n[m[outn.Pi | P2] \ Q] m[Pi \ P2] \ n[Q]); openn allows to destroy the 
boundary of an ambient n (openn. P \ n[Q] ^ P \ Q). 

Several static techniques, such as Type Systems fl(i:il2l4liSll| and Control 
Flow Analysis (CFA) |12li:ill4¥?) . have been devised to study and establish vari- 
ous security properties of MA, based on notions of classification and information 
flow. Although these approaches are strictly related and compute safe approx- 
imations of similar properties concerning the run-time topological structure of 
processes, their formulation as “ad hoc” syntax-directed systems makes it diffi- 
cult to formally compare them. 

We follow here the approach to program analysis of abstract interpretation 
m- The main idea of abstract interpretation is that program analyses effec- 
tively compute an approximation of the program semantics so that the specifi- 
cation of program analyses should be formally derivable from the specification 
of the semantics. The typical approach suggested by abstract interpretation con- 
sists of: replacing the concrete domain of computation with an abstract domain 
modelling the property we are interested in; establishing a relation between the 
concrete and the abstract domain which formalizes safeness and precision of ap- 
proximations; deriving, in a systematic way, an approximate safe semantics over 
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the abstract domain. We refer the reader to Appendix ^ for more details on the 
abstract interpretation theory. One of the most important and critical steps for 
applying abstract interpretation consists in the choice of the concrete semantics 
one should start from. The standard reduction semantics of MA is not ade- 
quate to abstraction, because it heavily relies on the syntax and uses structural 
rules and structural congruence to bring the participants of a potential reaction 
into contiguous positions. In order to overcome the above problems, we propose 
a new semantics for MA (equivalent to the standard reduction semantics) based 
on the simple observation that an MA process is essentially a tree of ambients, 
each one containing a set of active processes controlling its movements. Then 
we obtain, by stepwise approximation of the semantics, two analyses computing 
a safe approximation of the following (run-time) information: for any ambient 
which ambients and capabilities may be contained inside. This information has 
been fruitfully used to establish security properties, such as secrecy 0 , and to 
validate a firewall protocol H 21 . 

Because of space limitations we omit the presentation of the standard reduc- 
tion semantics of MA j2] and the comparison with our normal semantics which 
can be found in the extended version of the paper mi)- The normal semantics 
is presented in Section 0 and the two derived abstractions in Sections El and 0 

2 Mobile Ambients 

We enhance the standard syntax for the Mobile Ambients calculus without com- 
munication, adding labels to capabilities, restrictions and ambients, and parti- 
tioning the set of names. 

Let £ be a set of labels (ranged over by i, i' , . . .) and let £/ = {£i | £ G £, i G 
/} be the corresponding set of indexed labels (ranged over by A,/r, 7 , . . .). Let 
JV (ranged over by n, m, h,k , . . .) and let J\f (ranged over by n, in, h,k , . . .) be 
sets of names, such that Af n Af = 0. Moreover, let A// = {hi | n G A/", f G /} be 
the set of indexed names derived from Af. To have a more compact notation we 
may use n, m,h,k, . . . also for generic elements of Afj, when the meaning is clear 
from the context. 

Definition 2.1 (Processes). Processes over indexed labels Cj and names M 'A 
Afi are built according to the following syntax: 

M,N::= (capabilities) P,Q::= (processes) 

inn enter n 0 inactivity 

out n exit n (yn\) P restriction 

openn open n P \ Q parallel composition 

\P replication 

n\ [P] ambient 

M\ . P prefix 
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Standard syntactical conventions are used: trailing zeros are omitted, and parallel 
composition has the least syntactic precedence. We refer to the usual notions of 
names, free names, and bound names of a process P as n{P), fn{P), bn{P). 

We present some basic concepts, which are necessary to define the semantics 
and the abstractions. A substitution of names and indexed names is a function 
77 : Af U Afi ^ Af U A//. Standard notation P[m/n] and Prj is used for the 
application of a substitution to a process. As far as it concerns indexed labels, 
we denote by A{P) the set of labels occurring in P. A renaming of indexed 
labels is a function p ■. Ci ^ Ci. Standard notation P[X/ p] and Pp is used for 
the application of a renaming to a process. 

Indexed names A/ are used to replace bound names with fresh names. To 
that purpose we assume two injective functions Pic : £ — > A/" and Hci '■ Pi — > A// 
such that = fii if Hc{P) = n. We also adopt a notion of equivalence over 

processes. We define the renaming and substitution of indexes, induced by an 
injective function t : / ^ as ^ and 77 ^ : Ni A//, such that 

Pi{^i) = P{i) and r]c{ni) = npiy We say that P and Q are equivalent up to 
renaming and substitution of indexes {P ~ Q) if PpiPt = Q, for a renaming p^ 
and a substitution induced by some t. 

Moreover, we say that a process P is aetive \i P = M\.Q or P = IQ. In 
the following we use V and AV respectively for the set of processes and for the 
subset of active processes. Furthermore, we use A (ranged over by o, 6 ,c, . . .) for 
the set of labelled names n\, such that n G N LI Afi and X G £ 1 , augmented with 
a distinct symbol @. 

Remark 2.2. It is worth mentioning that the idea of annotating processes is 
typical of Flow Logic [El, where labels are introduced to keep distinct multiple 
occurrences of the same object and to handle a-conversion. In our approach 
indexed labels and names are similarly used both in the normal semantics and 
in the abstract interpretation. In particular, indexes are meaningless as far as 
it concerns the normal semantics and the second abstraction. Instead, they are 
necessary to define the first abstraction (see Example 14.91 and Example 14. 1 1)1 of 
Section EJ . 

3 The Normal Semantics 

The normal semantics is based on the intuitive interpretation of a process as a 
tree of ambients, each one containing a set of active processes. We use a set, called 
a topology, to represent the tree of ambients, and a set, called a configuration, 
to represent the active processes contained in each ambient. 

For instance a process a[in inTTi-y | outm,y.d[0] | 5[0]] | c[!inmv] (where 
a,b,c,d G A are labelled ambients) is represented by 

({a®, c®, &“},{ “infc^. inTTi-y, “out 777 ^,. d[0], A±nm\}). 

The topology contains because 5 is a son of a, and contains a® and ® 
because a and c are sons the outermost ambient @. The configuration contains 
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'^! in TO A because the active process ! in to a is executable inside c. Moreover, it 
contains “infc^.inTO-y and “out TO;^. c?[0] because both the active processes 
infc^. inm-y and outTO;^.c?[0] are executable inside a. 

In this representation restriction is handled in a particular way. For instance 
a process {vrix) (n-y[inTOyj]) is represented by ({ ^“’^'^inTOyt}), where 

fii G Afj is the fresh name provided by the indexed label attached to restriction, 
namely Hc;{X) = hi. 

Notice that, since in this representation the constructs of restriction, parallel 
composition and ambient are implicitly represented, the standard structural rules 
and structural congruence (including a-conversion) of the reduction semantics 
are no longer necessary. 

States. Formally, a state is a pair which consists of a topology and a configura- 
tion: the topology is a standard representation of a tree via a set of pairs (son, 
father) and the configuration is simply a set of pairs associating each active 
process to its enclosing ambient. 

Definition 3.1 (States). A state S is a pair {T,C) where 

1. T G p{A X A) such that, if (a, b), (a, c) G T then b = c (topology^; 

2. C G p{A X AV) (configuration/ 

We extend to states in the obvious way the notions of labels, renaming, sub- 
stitution, equivalence up to renaming and substitution of indexes ~. Moreover, 
we use S for the set of states. 

Normalization Function. In order to derive the initial state from a process 
and to handle the processes, which became executable after a step, we introduce 
a normalization function 5 \ {A y. V) ^ S. Intuitively, 5{a, P) gives the state 
representing process P assuming that P is contained in ambient a. Thus, the 
initial state corresponding to a process P is 6{@,P). 



DRes 5 °‘{un\) P = 
DAmb 5 °-h[P] 
DZero 5 “0 = 

DPar 5 “P I g = 
DBang 5 “!P 



5 ^P[HcAh)/n] 
5^P \J {{ 6 “}, 0 ) 
( 0 , 0 ) 

5 “p u (5 “g 
(0,r!Pi) 



DPref 6‘^Nx.P = (0, {“Aa.P}) 



Table 1. The Normalization Function S 



The normalization function 6 is shown in Tab. Q (where 6 “P stands for 
S{a,P)). Rule DRes eliminates the restriction by replacing the bound name n 
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with the fresh name Hci{X) provided by the indexed label A. Rule DAmb adds 
ambient b to the topology as son of the enclosing ambient a and normalises the 
process contained in b. Rule DPar gathers the processes and the topologies built 
in each of its two branches. The last rules simply add the active process to the 
configuration. 

Transitions. Transitions are obtained by the rules of Tab. |5| which realize the 
unfolding of recursion, the movement in and out of ambients, and the opening of 
ambients. The rules use the normalization function to handle the continuations. 
Moreover, they use a function new : V V, such that new{P) = Q, where 
Q = Ppt for some fresh renaming of indexes p^. In the following, T\ {,“/ /] 
denotes the replacement in the topology of every pair / with Analogously 
for configurations. 



(T, C)^5 <^new{P) U (T, C) 
c= “inm^.PeC a, mJ’^T a ^ 

{T,c)^5-p u ((r\{an)u{."*-},c\{c}) 

c= “out a™'", mJ’€T 

{T,C)^5<^P U ((T\{a--)}U{an.C\{c}) 

c= “openm.^. P € C a 7 ^ 

(T,C)^5<^P U ((T\{^/})[6“/6"‘H,(C^\{c})[“g/3r-Q/3]) 



Table 2. Transitions 1 -^ 



Rule Bang creates a fresh copy (equivalent up to renaming of indexes) of the 
process under replication. The last three rules correspond to the usual reduction 
rules of movements and opening (shown in the Introduction). Rule In verifies 
that the capability in m is enabled by checking if there exists a parallel ambient 
named m. Then, it modifies both the topology and the configuration accordingly: 
(i) it updates the father of a, (ii) it removes the executed capability and adds 
its (normalized) continuation. Out acts in an analogous way. Rule Open also 
extends the state by adding the set of processes and subambients acquired by a 
from the destroyed ambient m. 

The Collecting Semantics. We can now define the core of the abstract inter- 
pretation framework, the collecting semantics. In order to have a correct seman- 
tics we have to consider only processes and states which satisfy some restrictions 
on indexed names and labels ensuring that multiple occurrences of the same pro- 
cess or of ambients with the same name are keep distinct. 
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Definition 3.2 (Well- Labelled). A process P is well-labelled if: (i) for each 
hi G n{P), such that hi = He, (Hi), ii ^ A{P); (ii) the (indexed) labels used in 
capabilities, ambients and restrictions are distinct one from each other. A state 
S = (T,C) is well-labelled if condition (i) holds (for P replaced by S) and for 
each “P G C, P is well-labelled. 

In the following we assume that both V and S range only over well-labelled 
processes and states. 

The domain is the power-set of well-labelled states up to renaming and sub- 
stitution of indexes. We use [S'] for equivalences classes of states with respect to 
~. We assume that C and U over states are defined component- wise. 

Definition 3.3. The concrete domain is (p(S/...,), C). 

The concrete semantics is defined in a standard way as the least fixed-point 
of a continuous operator. 

Definition 3.4 (Collecting Semantics). Let X G p(S/.^) and let P G V a 

process. We define 6coiilPl = Ifp ®Pj) where T{X) = U[s]6x{[‘^1 I ^ 
S'}. 

Examples. The following example better explain the reason to consider well- 
labelled processes only. 

Example 3.5. Labels have to be distinct to guarantee that the translation func- 
tion 6 does not lose different instances of the same process or of ambients 
with the same name. For instance, for P = Iiitoa j inmv we would have 
(5( ®P) = (0, ®inTOA). The remaining condition concerning the free names 
of A// is necessary for a correct treatment of restrictions. For instance, for 
Q = {vn\) (n.y[0j) | (hi)^[0j, where Hc,{X) = hi, we would have S{ ®Q) = 
({ (ft.) ®, (ft.) ®},0). Even if the two ambients are keep distinct by labels /r and 
7 , this is not correct because they share the same name hi. 

The following example stresses an important aspect concerning indexed la- 
bels and names: replication produces processes equivalent up to renaming and 
substitution of indexes. 

Example 3.6. Consider the following process Q = n^Jinn^qj. 

We have &coU |!Q1 = X, such that X is the minimal set of states (up to renaming 
and substitution of indexes) where 

1. (0, ®!Q) G a:, and for each (T, C) G X, ®\Q G C; 

2. for each k G N there exists S = (T,C) G X such that Ll(S') C Uj6{i 

and fn{S) = {n}, and for each j G {1, k} either n,® & T and "b inn^/^. G C 
or G T, with h yf j, and "b inn^/ ^ C. 

The idea is that every unfolding of recursion produces a new copy of ambient 
n labelled by £j for a fresh index j . Any ambient n£^ may enter inside any other 
provided that h ^ j. 
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4 A First Abstraction 

We devise a first abstraction aimed at capturing the following property: for each 
ambient, which ambients and capabilities can be contained inside. We obtain 
a safe and computable approximation of the property above by combining two 
basic abstractions. 

We begin by considering abstract states where the topology is a hierarchy 
of ambients, possibly not a tree, and where the configuration contains also the 
father of the enclosing ambient of any active process. For example, a state S = 
({ a®, 6®}, “infc^.inm.y) is abstracted to = ({ q®, b®}, “ infc^.inm.y). 
The abstract state is equivalent to the concrete one with a minor modification: 
every active process is annotated by a partial topology, that is by the father of the 
enclosing ambient. For instance, we have “ in/c^. inm^, because infc^. inm.y 
is an active process of ambient a, whenever a is contained in @. 

To understand the relevance of the partial topology it is necessary to look 
at the abstraction of a set of states. Consider a state S' = ({ a, b®}, “inm.y) 
which is abstracted to 5"* = {{ a , “ inm.y). The set {S', S"} is abstracted 

to the unique abstract state which is the union: 

({a**, b®, a®},{ inm^,}). 

Obviously the obtained abstract topology is not a tree. For instance we have 
both a and a® . In this case the partial topology allows us to distinguish between 
the possibly multiple fathers of an ambient. For instance, it says that process 
infc^. inm-y is executable when a is inside @, while inm.y is executable when a 
is inside b. 

This abstraction is of course not enough to achieve a computable semantics, 
in that we may have infinite processes equivalent up to renaming of indexes (see 
Example Hence, we abstract also indexes by keeping only the following 
information concerning labels: whether there is at most one occurrence or any 
number of occurrences. 

For example, consider the states 

S = ( n;^, "^openmyj) and S' = ( nx®){ "^openmyi, "^openm-y}) 
where X = £'i, p, = £i and 7 = ^2- The abstraction gives the following 5* and S"* 

S'* = ( nx®, "^^openm^J and S"* = ( „^®, "x^openm^^). 

Capability openm in state S is abstracted to openm^^, and the two copies 
of openm in state S' are abstracted openm^^. The label (with multiplicity 
Lu) represents any number of occurrences of the corresponding object equivalent 
up to renaming of indexes, while a label £1 (with multiplicity one) represents at 
most one occurrence of the corresponding object. 

Abstract Domain. Abstract labels are £* = |£i,£u; | £ S >C} (ranged over by 
A*,/r*,7*, . . .). Abstract names are A/" U Af (ranged over by n*, m*, A:*, ft.*, . . .). 
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Analogously, we use (ranged over by 5*, c*, . . .) for the set of abstract la- 
belled names iA\o, augmented with the distinct symbol The relation between 
names and labels has to be modified accordingly. We define Hc<> : > Af such 

that Hco(ei) = Hco{ia.) = Hc(i). 

Abstract processes are built according to the syntax of Def. 12.11 over names 
AfUAf and labels £*. In the following we use and AV^ for the set of abstract 
processes and active abstract processes. 

Definition 4.1 (Abstract States). An abstract state S'* is a pair 
where 



1. T* G p(A* X A*) (abstract topology^,- 

2. C* G p((A* X A*) X AV'^) (abstract configuration/ 

All the previously defined notions on states and processes are adapted to 
abstract states and processes in the expected way. Only the notion of well- 
labelling requires some care. 

Definition 4.2 (Well- Labelled). An abstract process P* is well-labelled if : 

(i) for each fi G n(P^), such that h = Hc<>{X^), A* / A(P*); (ii) G A(P*) 
implies £c^ / A(P*) (and vice-versa); (Hi) there is at most one occurrence of any 
label £\. An abstract state S* = {C^ ,T^) is well-labelled if conditions (i) and 

(ii) hold (with P* replaced by S*/ and for each “ P* G C*, P* is well-labelled. 



In the following, we consider only well-labelled abstract states and processes 
denoted by S* and P*, respectively. 

We consider an ordering over abstract states which reflects the intuition that 
ii is more precise than Therefore, let C* be the minimal ordering over 5*, 
such that S'* C S'^ implies S*C*S'*, and such that S*C*S*/(^/£i]. As usual we 
use U* as the least upper bound w.r.t. C*. Notice that for X = £i and 'y = £uj 



P*)U*( 






P*). 



Definition 4.3. The abstract domain is (S*,C*). 

To simplify the notation in the following we may omit the over-script — * for 
any syntactic category, when the meaning is clear from the context. 

The Galois Connection. We present now the relation between the concrete 
and the abstract domain establishing a standard Galois connection (see Ap- 
pendix 0). A state is abstracted, as explained at the beginning of the section, 
by introducing the partial topology in processes, by replacing indexed labels £/ 
by labels with multiplicity P* and by modifying names A// accordingly. A set 
of states is abstracted to the abstract state containing the abstraction of each 
element. 

Let S' G 5 be a state. We define : Cj ^ £* such that p%{£i) = £ui, 
if £iAj S A(S) with i / j, and p%{£i) = £i otherwise. Moreover, we define 
775 : Ml N such that rjg{fii) = h. 
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Definition 4.4. Let X G p{S/^), (T,C) G S and 5* G 5^. We define a : 
p{S/^) and 7 : 6 '*^ p{S/^) as follows 

1. a{{T,C)) = {T,C^)p%'q%, where = { “ V | G T, “P G C}. 

2. a(X) = U[s]GX«(['S']); where a([S']) = Us'g[S]«(5"); 

5. 7(5") = U{[^] I a{[S])COS^}. 



Lemma 4.5 (Galois Insertion). The pair of funetions ( 0 , 7 ) forms a Galois 
insertion between {p{S/^),Cfi and {S^,C^). 



Abstract Semantics. The abstract normalization function 6^ : {A° x A*) x 
shown in TabEl is the obvious adaptation of i5 to the abstract domain, 
where is used instead of U to properly handle labels with multiplicity. 



{nnx) P = 


JO a*- 


P[P£o(A)/n] 


c[P] = 


JO 


p c“},0) 


0 


(0, 0) 


P|Q = 


JO a 


Pu‘^ s'^ “ g 


!P 


(0,{ 


“^p}) 


Mx.P = 


(0,{ 


“Va.P}) 



Table 3. The Normalization Function (5* 



The abstract transitions are obtained by the rules of Tab. 21 The rules use a 
function neWui ■ P* — *■ , such that neWaj(P) = Pp for the renaming p, where 

p(^i) = iui for any G A{P). We also use 

C\^{ "m^o.P}=\ 

[{C\ “'Mao.P) if = £1 

The rules are similar to the ones of Tab. El We explain the relevant differences 
only. The rule for the unfolding of replication creates a copy of the replicated 
process where every label has multiplicity uj, instead of creating a fresh copy 
(equivalent up to renaming of indexes) . Consider the rule corresponding to the 
execution of a capability inm. First, it verifies that the capability is executable 
by looking if there exist an ambient named to, which is contained in the father 
b of a. Then, it extends both the topology and the configuration accordingly: (i) 
it adds the new father to^o to a, (ii) it adds both the normalized continuation 
and the parallel processes to the set of processes executable when a is contained 
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in The other rules are adapted in an analogous way. Notice that in the 
rule for opening also the processes executable in any ambient c, when this is 
contained in the destroyed ambient, have to be updated. 

It is important to stress that both the partial topology and the multiplicity 
of labels are fruitfully exploited. Consider for instance the rule corresponding 
to the execution of a capability inm. The father b of a, recorded in the partial 
topology, is used both to establish the enabling of inm and to find out which 
processes are executable inside a in parallel with inm. P. The movement of an 
ambient named m inside itself is permitted only if the multiplicity of to is w. 
Moreover, a capability inm with multiplicity one can be exercised only once, 
and so is not executable in the new father m^o of a. By contrast, a capability 
in TO with multiplicity uj is still executable in the new father to^^o of a, as the 
other parallel processes. 
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Table 4. Abstract Transitions 



Definition 4.6 (The Abstract Collecting Semantics). Let G and let 

PGP. We define ecou4Pj = Ifp ®P]) }) where 

sueh that . 

The abstract collecting semantics is a safe approximation of the concrete one 
and so is an upper approximation of the property we are interested in. Safeness 
is stated in classical abstract interpretation style (see Appendix ^ . 

Theorem 4.7 (Safeness). Let S G S and P GV . We have a(<f'(S'))C^if'^(a(S')) 
and a{ecoulPj)Q^&CoUo\Pl 
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Examples. We present some examples to summarize the more interesting as- 
pects of the abstraction. The following example shows the advantages obtained 
by combining together multiplicity and partial topology. 

Example 4-8. Consider the process P = n\[±iike. inmi, \ m.y[0]] | A:^[openn/ 3 ], 
where labels {e, A, /3, 7 } are distinct also up to renaming of indexes. We 
have, ScoiilPj = {^o, S' 2 } such that 

So = {{n®, inwi., '^''openn/j}) 

= ({ k®},{ '='-openn^}) 

S 2 = ({ m.,''", k®}, ''"inm,,). 



Notice that capability inm is not exercised inside n because, m is not a 
sibling of n when it is executable. Moreover, k acquires by opening n only ambient 
m and capability inm. We have o;({iS'o, S'!, ^a}) = (T*,C^), where (to improve 
readability we use {e, /i, A, /3, 7 } for the corresponding abstract labels with 
multiplicity one) 



_ r @ nx , @ kij,\ 

^ — \ n>, 5 J kf^ : nx ) rriy f 

= { "A ink^.inm,^, opennp, inm,/, 



inm^}). 



Let us compare the abstraction of the collecting semantics to the abstract 
collecting semantics. We have Sco/pIL*] = U { (7*). The abstract 

semantics is safe but not exact, i.e. a{&CoiilP})C‘^&CoiiolP]- The labels with 
multiplicity and the partial topology are essential to have no loss of information 
in the configuration. Since in k^ has multiplicity one, it is no longer executable 
after the movement of n in k. Therefore, inm,/ is the only process which is 
contained in n when n is inside k, and which is acquired by k, when opening n. 

The subtle point is that, in the abstract semantics, n moves inside m. In fact, 
we have that inm is executable, when n is inside k, and that k may contain 
ambient m. The abstract semantics cannot capture that k effectively contains 
ambient m only after the dissolution of n. 

The following examples better explain the role of indexes in the abstraction. 
As we have discussed any labelling of a process P respecting the requirements of 
Definition El is enough to have a correct normal semantics of P. However, the 
choice of labels has dramatic consequences on the precision of the abstraction. 
Indeed, a typical schema to annotate processes is to keep all labels distinct also 
up to renaming of indexes so that only the copies produced by replication are 
identified by the abstraction. 

Example 4- 9- Consider the processes 



Pi = n^Jin/c^] I n^Jinm^] | mAp] 



P 2 = np[iiLkfj] I ri£ [in m^] | mAp] 
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where {/i, 7 , A, /3, e} are distinct also up to renaming of indexes. We have 

&Coii{Pil = ({ ni®, m®, "^2inm^}) 

&CoiilP2j = ({ n®, n®, m®, "'inm^}). 

The collecting semantics of Pi and P 2 are equivalent up to renaming of labels. 
In the abstract semantics we have (for readability we use {/r, 7 , A, /3, e} for the 
corresponding labels with multiplicity one) 

©Conoid’ll = ({ mx®, infc^, inw^}) 

&CollolP2j = {{ n®, n®, m®, "“inw^}). 

Due to a different choice of labels the two copies of ambient n are identified 
by the abstract semantics of Pi, while they are keep distinct by the abstract 
semantics of P 2 . 

Example 4-10. Consider the process Q = nA[inn.y] of Example Id. 01 where A = t'l 
and 'y = £[. We have for /r = and e = £'^, 

&CoU‘>lQj= {n®, ">“inn^) 

6co«oI!Q] = ({ n,,®, ®!Q}). 

The labels with multiplicity allows us to distinguish process IQ from process 
Q. In the abstract semantics of Q the label of n is £ 1 , which forbids the movement 
of n inside itself. Oppositely, in the abstract semantics of IQ the unfolding of 
recursion produces a label £^j for n, which forces this movement. This result for 
IQ is necessary to have a safe approximation of the concrete semantics, where 
the unfolding of replication produces multiple copies of n, which may interact 
with each other (see Example 1.4. 611 . In this example, the abstract semantics is 
exact, namely &CoU<'lQj = ai&CoiilQj) and 6 coi/®[!Ql = ai&CoiillQj)- 

5 A Second Abstraction 

On top of the previous abstraction, we define a new coarser abstraction, aimed 
at computing more efficiently a safe approximation of the same property. The 
abstraction is obtained by simply dropping multiplicity from labels and par- 
tial topologies from processes. For example, consider the states shown at the 
beginning of Section 0 where X = £[, p. = £1 and ^ = £ 2 ' 

S = ( „^®, "^openm^i) and S' = ( nx®,{ "^openm^i, "^openm.y}). 

In the new abstraction S'* and S'* are represented by the same abstract 
state ( Uf.,®, "'^'openmf), because the ability to distinguish one occurrence from 
multiple occurrences has been lost. 
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Abstract Domain. Abstract labels are £ and abstract names are Af U Af. 
Thus, the relation between names and labels is given by function He C ^ Af 
(see Section 0. We use A° (ranged over by a°, 6°, c° . . .) for the set of abstract 
labelled names such that n S A/'UA/' and £ G £, augmented with the distinct 
symbol Abstract processes are built according to the syntax of Def. tf. II over 
names N Af and labels £. We use V° and AV° for the set of abstract and 
active abstract processes. 

Definition 5.1 (Abstract States). An abstract state S° is a pair (T°,C°) 
where 

1. T° G p{A° X A°) (abstract configuration^; 

2. C° G p(A° X AV°) (abstract configuration/ 

All the previously defined notions on states and processes are adapted to 
abstract states and processes in the expected way. We use S° for the set of 
abstract states. 

The abstract domain is given by abstract states ordered by inclusion (we 
assume C and U defined component- wise). 

Definition 5 . 2 . The abstract domain is {S°,C). 

In the following we may omit the over-script — ° for any syntactic category, 
when the meaning is clear from the context. 

The Galois Connection. We present now the relation between the abstract 
domain of Def 14., SI a, nd the new abstract domain establishing, as before, a Galois 
connection (see appendix E} . 

An abstract state is abstracted, as explained at the beginning of the section, 
by dropping the multiplicity from labels and the partial topology from processes. 
To this purpose, we use a renaming p° : £® ^ £, such that = p°{iuf) = f- 

Definition 5 . 3 . Let ,C^), G , and S° G 5 °. We define a° : S° 

and j° : S° ^ as follows 

1. = (T^,{ “P I "P&C^})p°; 

2 . 7°(5°) = I o°(5'^) C 5'°}. 

Lemma 5.4 (Galois Insertion). The pair of functions (a°, 7 °) forms a Galois 
insertion between and (5°,C). 

Abstract Semantics. The abstract normalization function 6° \ A° x V° ^ S° 
is equal to the concrete one (shown in TabGJ with f in place A and He in place of 
He, ■ The abstract transitions are obtained by the rules of Tab. El The unfolding 
of replication simply creates a copy of the replicated process without modifying 
the labels. The rules for movements and opening are similar to the corresponding 
rules of the abstract semantics (see Tab. EJ restricted to multiplicity oj. There 
is only one difference: the topology is used instead of the partial topology. For 
instance, a capability in to is executed inside an ambient a, if a and an ambient 
named to have a common father in the topology. 
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°!P £ C 

(T, “P u (r, c) 

“inm^.PeC a*’, mf.,'’€T 

(T,c)i-^°5° “P u (ru{ 

“out mr. P e c e r 

(t,c)h^°5° “P u (t u t[„V V1,c) 

“openm^.PeC € P 

(T,c)i-^°5° “P u (r u r[ 6“/ 6'“^'],c u c[ “Q/ ”*^'g]) 



Table 5. Abstract Transitions 



Definition 5.5 (The Abstract Collecting Semantics). Let S° G S° and let 

PgV. ITe define ecoU-lPj = Ifv >f°({a°(a([^ ®P])) }) where L'°{S°) = {JS'° 
such that S°|—^°S'°. 

The new abstract collecting semantics is a safe approximation of the abstract 
collecting semantics of Def. 14. til 

Theorem 5.6 (Safeness). Let S'* G S* and P G V . We have a°('Z'*(S*)) C 
W°{a°{S^)) and a°{&cou4Pl) ^ &Coii4Pl 

It is a well-known results of abstract interpretation that Galois insertions are 
closed under composition. Therefore, Theorem Id. til implies that the new abstract 
collecting semantics is a safe approximation of the concrete one 0 . 



Examples. There are several interesting examples showing the differences with 
the abstraction of Section El For instance this abstraction doesn’t distinguish 
between one or more than one occurrence. Thus, the processes Q and IQ of 
Example 14. 1 1)1 are identified. Another loss of information is due to the removal 
of the partial topology, as the following example explains. 

Example 5.7. Consider the process P = Pi \ P 2 | P 3 , where Pi = !riA[inm^. infcj,], 
P 2 = !w/ 3 [ 0 ] and P 3 = !A:.y[0]. We assume that labels {A, /r, /3, 7 } are distinct 

also up to renaming of indexes. In the first abstraction we have (for readability 
we use {A, /i, /3, 7 } for the corresponding abstract labels with multiplicity uj), 

eCoii4P] = (TfC4 where 



rpo r @ @ @ 

^ — t nx 5 /C7 5 7710 5 n\ f 



(7^ = { ’"a ^ inm^. in ’"a ^ink^, ‘^F2, ^Pa}- 



/3 . 



1 



a°(a(6coH[P])) C 6coii°[P]- 
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Capability in k is not executed inside n, because it becomes executable only 
after the execution of inm, and k cannot be contained in m. 

In the new abstraction we have (for readability we use {A, /i, /3, 7 } for the 

corresponding abstract labels without indexes), 

= "MnA:,, ®Pi, ®P2, ^Ps}). 

Now in A: is executed, because n and k have @ as a common father. The 
abstract semantics cannot capture that in A: becomes executable inside n only 
after the movement inside m. 



6 Conclusions 

The main contribution of the paper is the definition of an abstract interpreta- 
tion framework for MA, based on the normal semantics. The normal semantics 
uses an explicit representation of the topological structure of processes, in terms 
of topology and configuration, which is more viable for abstraction than the 
standard reduction semantics. We have shown two safe abstractions establishing 
a property concerning the run-time topological structure of processes: the pro- 
posed abstract semantics are effectively program analyzers. By restricting the 
attention to a process P of size n, in the first case the topology of the greatest 
state contains at most 0{n^) elements and the configuration at most 0{n^) ele- 
ments. Hence, the iterations before reaching the fixed-point are at most 0{n^). 
Any iteration has complexity O(n^), because it requires to check at most 0{n) 
conditions for any element of the configuration. Similarly, in the second case 
we have at most O(n^) iterations, where any iteration has complexity O(n^). 
Therefore, it is not difficult to devise a naive implementation of the first analysis 
in 0{n7) and of the second one in 0{n^) by using standard algorithms. 

There are several CFA mm, in Flow Logics style, which compute safe 
approximations of the same property. More in detail, the analysis of m could 
be obtained in our framework from the abstract transition rules of Tab. 13 by 
weakening the conditions to be checked for the execution of the continuations 
of capabilities. Our second analysis is therefore more precise than the one of 
which can be implemented in O(n^) using more sophisticated techniques 
P3|. The CFA of ^3! is a very powerful exponential counting analysis that 
refines the approach of essentially by using multiplicity and by considering 
sets of abstract states rather than abstract states. The relation between the two 
analyses is demonstrated by a formal comparison in abstract interpretation style. 
We could obtain this analysis by removing the partial topology from the domain 
of our first analysis and by a standard “lift” to the power-set. Our first analysis 
combines together the idea of multiplicity from and in some sense the idea 
of using contextual information like the partial topology from the CFA of 0 
for Safe Ambients m- The integration of these two aspects allows us to have 
a simple polynomial analysis collecting non-trivial information. For instance as 
shown by Fxa,mn]e l4.?Sl of Section0the analysis is able to capture that capabilities 
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with multiplicity one can be used only once although it does not employ sets of 
abstract states as in m- 

In our opinion the semantic-based abstract interpretation approach demon- 
strates several advantages over the “ad-hoc” syntax-based approaches. We have 
shown that existing analyses and new analyses can be specified in a common 
framework, where the analyses have a simpler formulation, are related to a for- 
mal definition of the property we want to approximate, and are safe and directly 
compared in terms of precision by construction. The obtained analyses (in par- 
ticular the one of are more precise without affecting the complexity from an 
algorithmic point of view. Moreover, we believe that within this framework new 
analyses could be derived by simply modifying the abstract domain of properties 
according to standard abstract interpretation techniques. 

This work is part of a project aimed at studying the relationship among 
abstract interpretation, CFA and types. We intend, as a short term goal, to 
specify also the types of MA m within the proposed abstract interpretation 
framework. We believe that the formalization of CFi^ll and types in a common 
setting would allow us to finally compare their expressive power, integrate them, 
understand the pros and cons of each approach, and possibly for which class of 
properties one method is more adequate than another. 
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A Some Background on Abstract Interpretation 

We briefly recall the basic concepts of abstract interpretation. The main idea is 
that of establishing a formal relationship, using Galois connections, between the 
concrete domain {C, <) of computation and a simpler abstract domain {A, <“) 
modelling the property we want to approximate. The ordering <“ is intended 
to model precision so that a <“ a' means that a' is a safe approximation of a. 

Definition A.l (Galois Connection). Let (0,7) be a pair of funetions, such 
that a : C — > A (abstraction) and j : A ^ C (concretization). The pair (0,7) 
is a Galois connection iff'ic G C, Va G A,a{c) <“ a c < 7(a). If also 
0(7(0)) = a, then (0,7) is called a Galois insertion. 

Given a semantics S, computed as the least fixed-point of a semantic function 
F over the concrete domain, a(5) gives the exact abstract property correspond- 
ing to S. Therefore, safeness of an approximate semantics S°‘ over the abstract 
domain is guaranteed by the simple condition a(5) <“ S°‘ . One of the main 
results is that a safe approximate semantics S°‘ can be computed as the least 
fixed-point of an abstract semantic function F“ as stated by the following the- 
orem. 

Theorem A. 2 (Safeness). Let (0,7) be a Galois connection between {C,<) 
and {A, <“). Moreover, let F \ C ^ C and F°^ : A A be monotonic functions. 
If a{F{c)) <“ F“(a(c)), for each cG C, then a{lfp F) <“ Ifp F“. 
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Abstract. We use Abstract Interpretation to automatically prove safety 
properties of mobile ambients with name communications. We introduce 
a non-standard semantics in order to distinguish different recursive in- 
stances of agents. This allows us to specify explicitly both the link be- 
tween agents and the ambient names they have declared, and the link 
between agents and the ambients they have activated. 

Then we derive from this non-standard semantics an abstract semantics 
which focuses on interactions between agents. This abstract semantics 
describes non uniformly which agents can be launched in which ambi- 
ents and which ambient names can be communicated to which agents. 
Such a description is required to prove security properties such as non- 
interference or confinement for instance. 



1 Introduction 

The development of large scale communicating distributed systems imposes the 
design of both good models for mobile computation and we 11- fitted methods 
for analyzing properties of mobile systems. Mobility has quite a broad mean- 
ing. In the TT-calculus [I tij . mobility is implicitly described by name passing: 
agents communicate channel names. This dynamically changes the communi- 
cation topology between agents. In mobile ambients 0, mobility is explicit: 
ambients are bounded places and agents give them the capability to move inside 
other ambients taking their content with them. The connections between these 
two models are not well known yet. Anyway, we know that the asynchronous 
TT-calculus can be encoded into the ambient calculus |2J p:5]. Security properties 
of mobile systems are usually described either by simulation relations (such as 
non-interference [ I bj 1 or by some constrains over their control flow (such as con- 
finement or level of secrecy H21). Nevertheless, existing analyses often describe 
a set of configurations which lead to a leak of security, then prove that such 
configurations can never occur. This second step requires both control flow and 
system shape approximation. 

In previous works, we have proposed a control flow analysis 0, and an occur- 
rence counting analysis HD! for TT-calculus specified mobile systems. We propose 

* This work was supported by the RTD project IST-1999-20527 ’’DAEDALUS” of the 
European FP5 programme. 

P. Cousot (Ed.): SAS 2001, LNCS 2126, pp. 412-E3ID 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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to extend our framework with explicit mobility. In this paper we restrict our 
study to the analysis of the control flow of a mobile ambient, just considering 
name communications (instead of capability path communications). Our analy- 
sis consists in tagging each agent by an unambiguous marker which encodes the 
history of the replications that have led to its creation. Then, we label all the 
objects (ambients and ambient names) with the marker of the agent which has 
created them. We abstract for each agent both the set of the ambients it can be 
spawned in, and the set of the ambient names which can be communicated to 
it. We capture algebraic properties on the involved markers too. 

We claim that distinguishing the objects created by the recursive instances of 
an agent is crucial when analyzing mobile systems. The main difference between 
a mobile system and a system written in CCS is that recursive instances of agents 
can interfere via the objects they have themselves declared. Nevertheless, this 
aspect of mobility is ignored by most of the analyses proposed in literature, which 
either identify recursive instances of agents jncniHi or prevent the recursive 
declaration of ambient names |2]. Ambient groups use dependent types to 
prevent names of a fresh group from ever being received outside the initial scope 
of this group. In addition, our analysis also handles the algebraic properties of the 
markers. This allows us to describe the interaction between recursive instances 
of agents, whereas Q can only prove that ambient names are confined inside the 
recursive instance of the agent which has created it. 

The semantics for mobile ambients is given in Sect. El The non-standard 
semantics is introduced in Sect.|3 The abstract interpretation framework is re- 
minded in Sect. El Eventually we design a generic abstraction of the interactions 
between agents in Sect. Eland give three examples of analyses in Sect. El 

2 Mobile Ambients 

Mobile ambients 0 are a model of mobile computation. It describes a set of 
agents which are distributed throughout hierarchically organized domains called 
ambients. Agents interact inside ambients which makes the ambients move, tak- 
ing with them their content. We consider a lazy version of mobile ambients in 
that replications are performed only when necessary. For the sake of simplicity, 
we restrict ourselves to name communications: just names and not capability 
paths can be communicated. Let Af be a countable set of ambient names and 
Lbl a countable set of labels. We give in Fig. Ethe standard semantics of the 
mobile ambients. We locate each syntactic component of the system by placing 
distinct labels of Lbl. 

Example 1. We model a system S which describes a client-server protocol. To 
make things clearer, public (or global) names are written in roman, all the other 
names are written in italic and we abstract away many computational aspects. 
A resource creates recursively an unbounded number of clients. Each client is 
represented by a packet p[] which contains an ambient named “request”. This 
ambient contains the client’s query {q). At first, each packet enters the “server” 
ambient and then activates a pilot ambient “duplicate” which communicates 
the packet name to the server. This communication creates a recursive instance 
of an ambient named “instance” which will process the packet. The “instance” 
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n £ M (ambient name) M 

I £ Lbl (label) 



P,Q ::= (i^ n)P (restriction) 

— 0 (inactivity) 

— P \ Q (composition) io 

— n* [P] (ambient) 

— M.P (capability action) 

— io.P (input /output action) 



iri n (can enter n) 

out} n (can exit n) 
open* n (can open n) 

\open^ n (can duplicate itself 
before opening n) 

{n.y .P (input action) 

UnY .P (input action 

with replication) 
(n)* (async output action) 



Input action and restriction are the only name binders, in (n)*.P, !(n)*.P and 
{v n)P, occurrences n in P are bound. Usual rules about scopes, substitution and 
a-conversion apply. We denote by PN{P) (resp. BN[P)) the set of the free (resp. 
bound) names of P. 

(a) Syntax. 



P = 
P\Q = 
{P\Q)\R = 

P I 0 = 
{v n)0 = 

iy n)(v m)P = 
(v n)(P\Q) = 
{v n)(m*[P]) = 



Q if P Q 

Q\p 

P\{Q\R) 

P 

0 



(a-conversion) 

(Commutativity) 

(Associativity) 

(Zero par) 

(Zero Res) 



{v m)(u n)P (Swapping) 

P I {{v n)Q) if n ^ PAf{P) (Extrusion Par) 
n)P] if n yf m (Extrusion Amb) 



(b) Congruence relation. 



m.P | Q] \ m^[R] m^'[n*[P \ Q] \ R] 

m.P \Q]\R] n^[P \ Q] \ m*[P] 

open* n.P \ n^[Q] p | q 

\open^ n.P \ n^[Q] P \ Q \ lopen^ n.P 

(n) .P I (m) — > P[n <— m] 

!(n)*.P I (m) P[n ^ m] | !(n)\P 



Q 



n*[P] ^n*[Q] 



Q 



P' = P, P ^ Q, Q = Q' 



P\R^Q\R 

(c) Reduction relation. 



P' Q' 



Fig. 1. Standard Semantics. 
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ambient enters the packet, reads the request and sends it back inside an ambient 
named “answer”. At last, the packet exits the “server” ambient. 

S is defined as follows: 



i/Pub := {ly request)(i^ make)(j^ server)(j^ duplicate)(j^ instance)(j^ answer). 

Cl := request^^[(g)^^], C2 := open^^instance, 

C3 := m^®server.duplicate^^[oMt^®p.(p)^®], 

C := {u q){v p)p^‘^[Ci I C2 I C3] I (make)^°, 

Ii := answer® [(rep)®], I2 := ottf^^server, I := m®A:.open®request.(rep)^(Ii | I2), 
Si :=!open®duplicate, S2 :=!(/c)®.instance^[I], S :=server^[Si | S2], 

S := (i^Pub)(S I !(a;)^^.C | (make)®^). 

The following computation sequence describes the behaviour of the system: 

(i^Pub)(S I !(a)^^.C' I (make)®^) 

\{xY^.C I (make)®®] server ^ [Si | S2] | 

i 2 [requesti®[(pi)i 4 ] I C2 I 

m^®seryer .duplicate^^ [ out^^pi . (pi ) ^®] 



(i/Pub) 



{v qi){v pi)p_ 



(i/Pub)(i/ qi){iy pi) 

^!(a:)^^.C' I (make)®®! server^ 
(i^Pub)(i/ qi){iy pi) 

^!(a:)^^.C' I (make)®®! server 
(i/Pub)(j^ qi){iy pi) 

^!(a:)^^.C' I (make)®®! server 
(i/Pub)(i/ qi){iy pi) 

^!(a:)^^.C' I (make)®®! server 
{i^Puh){i' qi){iy pi) 

!(a;)^^.C'| (make)®®! server^ 

{i^Puh){i' qi){iy pi) 

/ 

\{xy^ .C I (make)®®! server^ 



Si I S2 I p^ 



12 



request^®[(gi)^"‘] | C2 
duplicate^^ [out^^p^ . {p 



! open® duplicate | S2 | duplicate^^[(pi 
pii®[request^®[(gi)i'‘] | C2] 

Si I !(fc)®. instance^ [I] | (p^[ 
pii®[request^®[(pi)i'‘] | C2] 



\19l 



,191 



19 1 



Si I S2 I p^^®[request^®[(gi)i'‘] | C2] | 

instance"^ [in^p^ ■ open® request . ( rep) ^(Ii|l2)]J 



Si I S2 



Pi 



12 



request^® [(91)^^] | open®® instance | 
instance^[open®request.(rep)^(Ii II2)] 



Si I S2 [pi^ 



request ^® [(g^)®"^] | 
open®request.(rep)^(Ii | I2) 



*{vPuh){iy qi){iy pi) 

(!(a;)^®.C I (make)®®! server ® [Si | S2 | pi®® [answer® [(gi)®] | ont®®seryer]]) 
(i^Pub)(i^ qi){iy pi) 

(!(ai)®®.C I (make)®®! server®[Si | S2] | pi®®[answer®[(gi)®]]) 

*(i^Pub)(i/ gi)(j/pi)(i^ g2)(i^P2)(!(ai)®®.C' | (make)®®] server®[Si | S2]| 

Pi®®[answer®[(gi)®]] | p2®®[answer®[(g2)®]]) 



□ 



3 Non-standard Semantics 

The non-standard semantics is a refined one with explicit substitution. It restores 
the link between the recursive instances of agents and the objects they have 
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created (i.e. the names they have declared and the ambients they have activated). 
Following Dtt H3| style, we describe a mobile system with a set of agents tagged 
with a location marker. Furthermore, the embedding structure of the ambients 
imposes a description of the hierarchical tree of the administrative domains (or 
ambients). This is given by a set of activated ambienttQ (seen as locations) 
tagged with location markers specifying the surrounding ambient. We assume 
that a system is run inside a top level ambient which has no location. The link 
between agents and the ambient names they have declared is made explicit by 
tagging each agent by an unambiguous history marker allocated at its creation. 
Then, each new ambient name is tagged with the history marker of the agent 
which has declared it. Thus, we restore the link between agents and the ambients 
they have activated by tagging each activated ambient with the history marker 
of the agent which has activated it. 

Let 5 be a closed mobile system in the ambient calculus. We assume without 
any loss of generality that two name binders (y n or (n)) are never used to bind 
the same ambient name. History markers are binary trees the node of which 
are labeled with elements of Lbr and the leaves of which are not labeled. The 
tree having a node labeled A, a left sibling ti and a right one t 2 is denoted by 
We denote by Id the set of the history markers. Ambient names are 
described by a pair (n, id) where n specifies which action {i/ n) has created it 
while id is the history marker of the agents which has declared this name. Acti- 
vated ambients are identified by a pair (i, id) where i is the label of the ambient 
constructor which has activated the ambient while id is the history marker of its 
activatoi0. The top level ambient is represented by the pair (top, e) (we assume 
that top G Lbl has not been used for labeling S yet). Location markers are pairs 
(i, id), too. A location marker refers to the ambient where a process is spawned. 

A non-standard configuration Ml is a set of thread instances, where a 
thread instance is a tuple composed by a syntactic component, a history marker, 
a location marker and an environment. The syntactic component is either a 
syntactic copy of an agent of S or an activated ambient denoted by n*[»]. The 
history marker is unambiguously allocated at the thread creation. The location 
marker indicates where the thread is run. The environment specifies the origin 
of the free syntactic ambient names of the syntactic component. 



Example 2. We give here the non-standard configuration reached after complet- 
ing two sessions of our serveiQ: 



^ Also called privileged ambients in |1]. 

^ An ambient cannot be identified by its ambient name because two distinct activated 
ambients can have the same name 0 p:12]. 

® We do not figure the origin of public names. 
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(server^ [•],£, (top,£),0) 
(answer®[»], irfg, (12, ido),0) 
(answer® (12, idi), 0) 

ido, (top,£), [p ^ (p, ido)]) 
(pi2[,]^ idi, (top,£), [p ^ (p, idi)]) 



where: 



zdo = fV((ll,21),£,£) 
idi = N{{ll,20),e,ido) 
idg = N{{3, 19), £, ido) 
id'i = -/V((3, 19), £, idi) 



< (!(x)^^C,£, (top,£),0) 

((make)^°, idi, (top, £),0) 

(Si,£,(l,£),0) 

(S2,£,(1,£),0) 

((rep)®, irfg, (8, id'o), [rep {q, ido)]) 
, ( ( rep) ^,idi,{ 8 ,id[),[rep^ {q, idi)]) 



The top five instances represent the hierachic structure of nested ambients, 
the others describe the agent distribution. Location markers allow in reconstruct- 
ing the following ambient: 



in where ambients, ambient names and agents are stamped with their own mak- 
ers. Thanks to name markers, we avoid conflict between ambient names. So 
we can extrude their declaration inside the top level ambient. In this way, the 
shortcut (un) denotes the declaration of all the ambient names of the configu- 
ration. It appears explicitly that, in each packet, both the name of the packet 
and that contained in the “answer” ambient embedded in the packet have been 
declared by the same recursive instance of the resource !(a:).C. This means that 
the answer of a query is sent to the good client. □ 

The non-standard semantics is given in Fig. 0 by both an initial non-standard 
configuration and a reduction relation. Their definitions use the extraction func- 
tion (3 defined in Fig.Q Given a continuation P, an history marker id, a location 
marker loe and an environment E, (3{P, id, loe, E) gives the set of all the thread 
instances that must be spawned to simulate the computation of the process E{P) 
identified with the marker id, in the ambient denoted by loe. It especially deals 
with new ambient name declaration and new ambient activation. 

We informally describe the non-standard semantics. For the sake of the 
brevity, we only detail the non-standard in migration rule, in migration rule 
involves two distinct ambients A, p and an agent i/'- They are respectively 
denoted by three configurations (n^[»], idi, loci, Ei) , (m^»], id 2 , 10 C 2 , E 2 ) and 
{in^o.P,ido,loeo,Eo). The in migration rule is enabled if and only if the two 
ambients are located in the same ambient (this gives the constrain loci = I 0 C 2 ), 
the agent is located in the first ambient (this gives I 0 C 3 = (i,idi)) and the agent 
capability can interact with the name of the second ambient (this is encoded by 
the constrain E 2 {rn) = Eo{o)). The result of such a migration is that the first 



(top,e) !(a;)(^^’®) .C I (make)*^®®’*'^^^ | server(^’^^[Si | S2] | 

(i^ n) (p, jdo)*'^^’*‘*°^[answer(®’®‘^o)[((<7, I 

_(p, jdi)(^®’*'^i)[answer(®’*‘^i)[((<7, 
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/3(n*[P], id, loc, E) = f3{P, id, {i, id),E) U id, loc, [n i— > 

(3{P I Q, id, loc, E) = /3(P, id, loc, E) U (3{Q, id, loc, E) 

(3{{v n)P, id, loc, E) = /3{P, id, loc, {E[n (n, irf)])) 

(3{M.P, id, loc, E) = {{M.P, id, loc, E\jrj^(^M.p))} 

P{io.P, id, loc, E) = {{io.P, id, loc, P|;rAf(io.p))} 

/3(0, id, loc, E) — 0 



E{n)])} 



Fig. 2. Extraction Function. 



ambient moves inside the second one (its location is just replaced by {j, 1^2))- All 
it content is taken with it (this does change neither their location markers, nor 
their environments), but the agent ij) is executed and its continuation is spawned 
inside the first ambient ('0 is replaced by / 3 {P, id^, I0C3, i?3|^7y(p))). The out mi- 
gration is simulated in the same way. The ambient dissolution is a bit much 
complex since all the locations of the dissolved ambient content are changed. 
We shall notice that each time a resource is fetched a new history marker is 
deterministically allocated: it is given by idi, idj) where i is the label of 

the resource, idi is the history marker of the resource, j is the label of the thread 
which enforces the resource fetching and idj is the history marker of this thread. 
We do not need a congruence relation because our set-based representation of 
configurations makes structural congruence rules useless and the use of history 
markers avoids conflicts between ambient names. 

Standard and non-standard semantics are strongly bisimilar. The proof re- 
lies on that non-standard computations cannot yield conflicts between history 
markers. Moreover, in accordance to the following proposition, we can simplify 
the shape of the history markers without losing the consistency of our semantics. 

{ Id {Lbl^Y { Id Lbl* 

N{a,b,c) a.(t) 2 {N{c)) 02 : < N{{i,j),b,c)p^j.(p 2 {N{c)) 

e I— ^ e I £ 1-^ e 



Proposition 1. Let 0 be 0i or 02 and Co — s- ... — > Cn be a non-standard 
computation sequence, where Cq = Cq{S). For all i,j G [|0,n|], {p,id,loc, E) G 
Ci and {p', id! , loc , E') G Cj, such that cf>{id) = cj){id!) then id = id’. 

Such simplifications allow us to reduce the cost of our analysis, but also lead to 
a loss of accuracy, since they merge information related to distinct computation 
sequences of the system. 



4 Abstract Interpretation Framework 

We denote by C the set of all possible non-standard configurations and by E the 
set of transition labels. We are actually interested in the set Coll{S) of all the 
configurations a system may take during a finite sequence of computation steps. 
This is given by its collecting semantics jS] and can be expressed as the least 
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Co{S) = I 3 {S, e, (top, e), 0 ). 

(a) Initial configuration. 



If C is a non-standard configuration, 
if there are A, /i, tp in C, (A / fi) 

with A = («*[•], idi, loci, Ei), fi = id2, 10C2, E2) and p) = {^in^o.P, idz, I0C3, E3), 

such that loci = I0C2, I0C3 = {i, id\) and E2{m) = i?3(o) 

then C {C \ {A, i>}) U (n*[.], idi, {j, U2), Ei) U /3 (P, f*, I0C3, P3|^a/-(p)) • 

If C is a non-standard configuration, 
if there are A, /i, ^|J in C, 

with A= idi, loci, Pi), /i= (n^ [»], id2, 10C2, P2) and ip= (^out^o.P, id3, 10C3, P3), 

such that I0C2 = (i, idi), I0C3 = {j, id2) and E\{m) = ^3(0) 

then C (C \ ^p}) u (n^[»], id2, /oci, P2) U f 3 {P, U3, 10C3, Psijfatcp)) • 



If C is a non-standard configuration, 

if there are A, /r in C, with A = [operfm.P, idi, loci, Pi) and fi = (n-’[»], id2, 10C2, P2), 
such that Zoci = I0C2 and Pi(m) = p2(n). 



then C (C \ ({A, /r} U A)) U /I (P, idi, Zoci, Pi|^a/-(p)) U A' 



where 



A = {(o, id, loc, P) € P I /oc = (j, id2)} 

A' = {(a, id, I0C2, E) \ {a, id, {j, id2), E) G C} . 



If P is a non-standard configuration, 

if there are A, /i in P, with A = {\open'‘m.P, idi, /oci,Pi) and p, = («•■’[•], id2, Zoc2,p2), 
such that Zoci = I0C2 and Pi(m) = p2(n). 



then P (P \ {{p} U A)) U /3 (P, P((Z, j), idi, *^2), Zoci, Pi|pa/-(p)) U A' 

, f A = {(a, id, Zoc, P) G P I Zoc = (j, id2)} 
where < , 

I A = {(a, id, I0C2, E) \ (a, id, {j, id2)), P) G P} . 



(b) Move rules. 



If P is a non-standard configuration, 

if there are A, p in P, with A = {{nY-P, idi, loci, Pi) and p = {{m)Y id2, Z0C2.P2), 
such that loci = I0C2, 

then P (C \ {A, p}) U / 3 (P, idi, loci,Ei[n P2(m)]|piv(p)). 

If P is a non-standard configuration, 

if there are A, p in P, with A = (!(n)*.P, idi, loci. Pi) and p = {{m)Y id2, Z0C2.P2), 
such that Zoci = I0C2, 

then P ((7 \ {/i}) U / 3 (P, N{{i,j), idi, id2), loci,Ei[n ^ P2(m)]|pjv'(p))- 

(c) Communication rules. 



Fig. 3. Non-standard Semantics. 
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fix point of a U-complete endomorphism F on the complete lattice x C) 

defined as follows: 



F(X) = {(e,/3(5))}u{(u.A,C") 3CgC, {u,C)eX andC^C'^ 



This least fix-point is usually not decidable, so we use the Abstract Interpre- 
tation framework [;8| to compute a sound - but not necessary complete approxi- 
mation of it. More precisely, we use the relaxed version of Abstract Interpretation 
P], in where, among others, the abstract domain is not supposed to be complete 
under lowest upper bound; furthermore, no abstraction function is required. 

Definition 1. An abstraction is a tuple (C**, C**, |J**, _L**, 7 , Cq, V) such that 

1. (C**,C**) is a pre-order; 

2. U** : Pfimte(C*) ^ c** such that VA# e pfinite{C^), Vo** G A*, a* lj‘‘(At*); 

5. T« G Cf satisfies Va« G C#, T« a«; 

7 : C* ^ p(I7* X C) is a monotonic map which satisfies 7(J-**) = 0; 

5. Cq G is such that (e, Co(5)) G 7 (^ 0 )’’ 

6. -^G p(C^ X X X C**) is an abstract deterministic labeled transition relation 
over C# such that : VC« G C#, V(u, C) G 7 (C'«), VA G A, VC G C, 

C =>3C^ e C“, (C** 4 C" and {u.X,C) G 7 (C*)); 

7. V \ X C'^ ^ C'^ is a widening operator which satisfies: 

- VC{, cl G C«, C\XCl and C* C C\XCl, 

- V(C«)„eN G (C**) , the sequence (C'^)„gn defined as 

[c^ =cl 

\ c^+i = 

is ultimately stationary. 

C'^ is an abstract domain. It captures the properties we are interested in, and 
abstracts away many other properties. The pre-order describes the amount of 
information which is known about the properties we approximate. We use only a 
pre-order to allow some concrete properties to be described by several unrelated 
abstract elements. [J** is used to gather the information described by several 
abstract elements, for the sake of generality, it does not necessarily compute the 
lowest upper bound of a finite set of abstract elements which may not even exist. 
3-^ describes the empty set, it provides the basis for our abstract iteration. 7 is 
a concretization function which maps each abstract property to the set of the 
concrete elements which satisfy this property. Cq is an abstract element which 
describes the properties satisfied by the initial configuration of the system, -w is 
used for mimicking the concrete transition system in the abstract domain and 
V is used to ensure the convergence of the analysis. 

In accordance with Def. HE the abstract counterpart F** to F, defined as: 



F“(A“)= 1J“{c* |3Ag a, C“gA“, C“ 4c“}u{C“;A“} 



satisfies the soundness condition VC** G C**, Fo 7 (C**) C 7 oF**(C**). Using Kleene’s 
theorem, we obtain the soundness of our analysis: 
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Theorem 1. Z/p 0 F C y [7 o 

neN 

Following we compute a sound and decidable approximation of our ab- 
stract semantics by using the widening operator V: 

Theorem 2. The abstract iteration gg' o/F« defined as follows: 

r F^ = _l“ 

Lv */F#(F^)CFV 

[ |F^VF#(F^) otherwise 

is ultimately stationary and its limit F^ satisfies Coll{S) C 7 (F^). 



Remark 1 . We claim that this framework is highly extensible: given two ab- 
stractions and (C|, , _L^, 72, , ->2, V2) 

and a reduction operatoi0 p '■ Cj x C2 which satisfies: 

Va** e Cj X C2, 71 (aj) n 72(a2) C 71(6^) n 72(62), denoting p{a^) by {b\,b\). 



The following tuple (C^, C**, [J**, _L**, 7, Cq, -w, V) where 

- C« = Cf X C“; 

“ E**, V are defined pair-wise; 

fc«^p(r*xc) 

^ 7i(a“i) ^72(4); 

- C«(5 )=p(<,<); 

- is defined by: 

a d if and only if denoting p{a) by (6}, 62), there exists c{ S C\, C2 S C\ 
such that b\ cj, b\ ~^2 C2, d = p(c5, C2). 



is also an abstraction. 



5 Control Flow Analysis 

We propose to describe all the potential interactions between all the agents of 
a given mobile ambient. For that purpose we will compute for each thread an 
approximation of both the set of the ambients it can be immediately located in 
and the set of the ambient names which can be communicated to this thread. 
We want a non-uniform description of this. This means that we will compare 
the history marker of each thread with the marker of its location and with the 
markers of the ambient names it is communicated to. 

The main difficulty is to synthesize such comparisons throughout computa- 
tion steps. We use the history marker of each thread as a pivot to synthesize the 

^ p allow simplifying the properties obtained in the two abstractions. 
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comparison between the other markers (location markers and markers of the am- 
bient names) of this thread. Furthermore we use synchronization conditions on 
ambient names and on locations of the agents in establishing a comparison be- 
tween the history markers of all the involved threads. Our main strategy is easy: 
we first gather all the information we have about the pairs of markers (this means 
we will abstract sets of tuples of markers) . Then synchronization conditions give 
equality relationships between tuple components. If equality relationships are 
satisfiable, the abstract computation step is enabled and we compute, for each 
new thread, the comparison between its history marker and its other markers. 

For each n G N, we introduce an abstract pre-order (W^,En) to represent 
sets of n-uples of history markers. Thus, each is related to p{Id"‘) by a 
monotonic concretization function y„. We introduce a few abstract primitives to 
handle these domains: a representation of the empty set J_((, a representation of 
the initial identifier an abstract union |J^, an associative abstract concatena- 
tion to gather the abstraction of tuple sets, an abstract join assert to enforce 
synchronization conditions, an abstract projection U and an abstract push oper- 
ator push which is used to calculate the abstraction of the set of the new markers 
when fetching a resource. These primitives shall satisfy the following properties: 

- 7 n(-Ll) = 0; 

— ^ ^ "Ti 

- y A € pfinUeildl), Un(^) ^ ^nd Vtt** G A, a** Un(^); 

~ Va G Idl^, b G (a b) G Wfj+m ^tnd 

'A '~in+m(, 0 - b)', 

— Va** G yA G p([|l;n|]^), assert{A,a^) G and 

{(*<^i)iG[|i;n|] I i'i'di) e V(fc, Z) G A, idk = idi} G 7„(ossert(A, a**)); 

- Va** G / 4 , Vp G N, V(sfe)fcg[|i.p|] G [| 1 ; such that (sfc) is a one to one 

sequence, 77 (sj,)(a**) G iSp and 

{{^ds^)kel\l■,p\] I i^di)ie[\i-n\] ^ 7 n(a**)} C 7p(77(^,, )(a**)); 

— Va** G W3, pws/i(jj-)(a**) G 1S2 

j), idi, id2), ids) I (idi, 1^2, fc^s) G 73(0**)} C 72(pMs/i(j^)(a**)). 

Moreover we define the operator dpush G iF{Id{,Id2) ^ 7 ' 

Va G Id{, dpush{a) = assert({(l, 2 )}, a •** a), 
dpush satisfies the following property: 

Va G W}, {(id, zd) | id G 71(a)} C ^2{dpush{a)) . 

We denote by V the set of all the syntactic components of S. We describe 
the set of ambients in which threads can be launched by associating with each 
pair (P, i) G (P X Lbt) a description of the set of the marker pairs {idp, idi) such 
that an instance of P may be stamped with both the history marker idp and the 
location marker (i,idi). In addition, we describe the set of the ambient names 
which can be communicated to a thread by relating each triplet (P, m, n) G 



(idl)i^[|l;n+m|] 



(fc^i)ie[|l;n|] G 7n(a**) 



(or! 
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{V X BAf{S) X BN{S)) to a description of the set of the marker pairs (idp,idn) 
such that an instance of the P may be stamped with the marker idp whereas 
the syntactic name m of P is bound to the ambient name (n, idn)- In this way, 
our abstraction (C**, [J**, 7 , Cq, V) is defined as follows: 



- = P{V X Lbl, Idl) X P{V X BAf{S) x BAf{S), Idl) 

— are defined component- wise then pair-wise. 

- is given by the pair of functions which relate any elements to 

— 7 ((/, 5 )) is given by the set of the configurations C which satisfy: 

f {id, idi) G 72(/(P,i)) and 
yE{m) = {n,idn)=>{id,idn) G 72 (g(P, m, n)). 
are given in Figs. IblYI 



(P, id, {i, idi), E)gC 



— Cq and 



Abstract transition rules just mimic the non-standard ones, therefore they are 
quite complicated because non-standard transition rules handle several synchro- 
nizations between markers. Their definition uses an abstract extraction function 
defined in Fig. 0 



- P^{n'[P],i(fi,loc^,E^) = (a[(n'[»], j) loc^{j)],b[{E[»],n,m) P**(n,m)]) 

where (a, b) = /9**(P [i dpush{i<^)], P**); 

- d^P I = \J*{P'^{P,iS,loc\E'^)-,P'^{Q,iS,loc\E'^)}-, 

- n)P, id*, Zoc**, P**) = /3**((z^ n)P, iS , loc^ , P**[(n, n) 1 — > dpush{iS)])-, 

- l3\M.P,ici^,loc\E*) = (a,b) 

ja= [{M.P, i) I— > Zoc**(i)] 

1^6 = [{M.P, m, n) 1 — > E^{m, n) if m G EM (M.P)]; 

- !3'^{io.P,iS,loc\E^) = {a,h) 



where 



a = [{io.P,i) Zoc**(i)j 

b = [{io.P, m, o) E^{m,o) if m G EM{io.P)]-, 



- P^{0,id'>,loc\E‘^) = _L**. 



Fig. 4. Abstract Extraction Function. 



/3** is an abstract counterpart to f3. It calculates all the interactions obtained 
by spawning a continuation in an abstract location: given P G P, iS G /dji 
Zoc** G E{Lbl,Id\) and P» G T{BM{S) x BM{S),Id\), f5^{P,id\loc\E^) gives 
a pair (a, b) G C'^ which describes all the interactions obtained by spawning a 
syntactic component P identified by a marker described by iS , in a location 
described by loc^ and with an environment described by E'^, as expressed by the 
following proposition: 

Proposition 2. (3{P,id,{i,idi),E) G 'y{(3^{P, id\loc^ , E'^)) , ^i G Lbl, id, idi G Id 
with id G ji{id^), {id, idi) G 72 (Zoc**(Z)) and VP G P{EJ\f{P), {BM{S) x Id)) such 
that ym,n G BM{S), \/idn G Id, [E{m) = (n, {id, idn) G 72(P**(m, n))] . 

We now give some intuition about the abstract transition rules. For the sake 
of the brevity, we focus on the in migration abstract rule. The three syntactic 
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Cq = /3**(5, £**, [top dpMs/i(e**)], 0) 

Fig. 5. Abstract Initial Configuration. 

Let (/,<?) e C**, if there are A = n*[»], /r = ip = in^o.P, 

if Uio{Mh,nm) I lx e Lbl, rim € f?A'(5)} / 

then (f,g) □•*{(/', g); /3*(P, Zoc", A*)} where 

- A(^a, nm) = assert{sync, /(A, Za) •'* f{n, l\) •** f{rp, i) •** m, rim) •** girp, o, rim)), 

- sync = {(2, 4); (1, 6); (8, 10); (3, 7); (5, 9)}, 

- /' = m\j) ^ U2({/(^>i)} U {71(1,3) (A(lA,n„,)) I lx € Lbl, nm G 7?Af(5)})], 

- iS = LJ^{77(5)(A(iA,nm)) I lx G Lbl,Um G BAf{S)}, 

- loc^ = [i Ll2{77(5,6)(A(ZA,n™)) j lx G Lbl,nm G Z3A/'(5)}], 

- F* = [(<j,r) 77(1,2) (assert({(l, 3)}, {g{(l),g,r) •* iS))),'i{q,r) G BJ\T{S)'^]. 

Let (/,(?) G C\ if there are A = g. = n7[»], ip = out^o.P, 

if Uio{^(^A.«-) I 'a G Lbl, nm G BAf{S)} / a“o, 
then (f,g) □"{(/', g); /3**(P, id**, toe", 7?*)} where 

- A{lx,nm) = assert{sync, f{\,lx) •" /(/r, i) •" f{ip,j) •" g(\,m,nm) •" g{rp,o,nm)), 

- sync = {(1, 4); (1, 7); (3, 6); (5, 9); (8, 10)}, 

- /' = /[(7i, lx) ^ U2({/(f^. 'a)} U ( 77(3,2) (A(ZA,n™)) | nm G 7?Af(5)})j, 

- iS = Ui({77(5)(A(to,n™)) I lx G Lbl,nm G 7?A'(5)}), 

_ loc^ = [i Ll2({^(5,6)(A(/A,n™)) I lx G Lbl,nm G 7?A'(5)})], 

- a" = [(<j,r) 77(1,2) (asseri({(l, 3)}, {g{(p,q,r) •" id?))),\/{q,r) G BAf{S)% 

Let (/,<?) G C", if there are A = operi^m.P, g = [•], 

if U8 {^(^a, nm) I lx G Lbl, nm G BAf{S)} 7^ _L^, 
then (f,g) □"{(/', g); /3**(P, i(7", Zoc", £")} where 

- A{lx, nm) = assert{sync, /(A, lx) •" f{g, lx) •" ff(A, m, n^) •" g{g, o, Um)), 

- sync = {(1,5); (2, 4); (3, 7); (6, 8)}, 

- /' = /[(V-, ^a) ^ U 2 ({/W. ^a)} U {77(^, Za, nm) \ nm G 7?Af(5)}), G V], 
where B{ip, lx,nm ) = 77(9,2) (assert({(3, 10)}, A(/a, n™) •" figp,j))), 

- iS^ Ui{77(i)(A(ZA,n™)) I lx G Lbl, Um G 73A'(5)}, 

- toe" = [lx ^ U 2 {n(i, 2 ){A{lx,nm)) \ nm G 73A'(5)}, Vto G Lbl 

- A" = [(<j,r) 77(1,2) (assert({(l, 3)}, {g{X,q,r) •" id"))), V(<j, r) e BAf(S)^[. 

Let if,g) G C", if there are A =\operi^m.P, g = n^ [•], 

if U8 {^(^a, nm) I lx G Lbl, nm G 73A'(5)} / 

then (f,g) □"{(/', y); /3"(P, id", Zoc", 7i")} where 

- A{lx,nm) = assert{sync, /(A, lx) •" /(71, ^a) •" g{\, m, Um) •" g(/i, o, nm)), 

- sync = {(1,5); (2, 4); (3, 7); (6, 8)}, 

- /' = nil, lx) ^ \J 2 nfil, ^a)} U {L{rp, lx, nm) \ nm G 7?Af(5)}), G V] 
where I{tp, lx, nm ) = 77(9,2) (asseri({(3, 10)}, A(/a, n„) •" f{i>,j))), 

- td" = Ui{77(i)(toc"(to)) I Vto G Lbl}, 

- toe" = [lx ^ P“s7(i,j)(lJ3{77(i,3,2)(A(to, n^)) j n™ e 73A/'(5)}), Vto G 76/j, 

- 7J" = [((j,r) pns7(i,j) [J3{7((j, r, / a, nm) j lx G Lbl, nm G Lbl},\/{q,r) G BM{S)'^[ 
where I{q,r,lx,nm) = 77(i, 3,10) (assert({(l, 9)}, A(Za, nm) •" g{X,q,r))). 

Fig. 6. Abstract Move Rules. 
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Let (/,<?) G C**, if there are A = (n)*.P, /r = {mY two sub-processes of S, 
if U 4 {^(^a) I lx G Lhl} Y 4, 

then {f,g) □**{(/, g); /3**(P, id", toe", P*)} where 



- Ailx) = asseH{{{2, 4)}, /(A, h) •" /(m, l\)) 

- tS = UY{n(i){A{ix))\yix& Lhl}), 

- loc^ = [lx ^ 77(i,2)(A(Za)), Vto G Lbl 

- = [(q,r) ^ U 2 {I(<l,r,lx) \ lx G Lbl}, y{q,r) G BU(S)Y 
where Tin r IA = j n(i,6){assert{{{3,5)}, A{lx) g(n,m,r))) 

1 77(1,6) (assert({(l, 5)}, A(to) •" g(A,(7,r))) 



if q = n 
otherwise 



Let (/,<?) G C", if there are A =\{ny.P, g, = (m)-" two sub-processes of S, 
if U4 {^(^a) I ^A G Lbl} Y ±i, 

then (f,g) □"{(/, g); /3**(P, id", toe", P")} where 

- A(/a) = assert({(2,4)},/(A,/A) •" f{g,lx)), 

- tS = UAn(i){iocYix)) I lx G Lbl}, 

- toe" = [lx push^i ^ 2 ) {A{lx))), Vto G Lbl\, 

- P" = [{q,r) Ll 2 {-^('?>to^A) I lx G P6/} V(g,r) G BJY{S)Y where 

/(g ^ ^ |p“s/i(i,j)(P(i,3.6)(asseri({(3,5)}, A(to) •" ff(fi,m,r)))) if q = n 

|pjis/i(j^j.)(J7(i^3_6)(asseri({(l,5)}, A(to) •" g(A,g,r)))) otherwise 



Fig. 7. Abstract Communication Rules. 



components A, g and if denote the three threads involved in the non-standard 
in migration rule. We check for each pair {l\, nm) whether there can be a config- 
uration containing instances of A, g and i/', such that both instances of A and g 
are surrounded by an instance of an ambient labeled with l\, Y is located in the 
instance of A and both the ambient name of the instance g and the name the ca- 
pability of Y work on are linked to an ambient name created by an instance of the 
action {v Um)- We then compute A{1\, n„i) which is a description of the relation 
between the involved markers: we first gather the descriptions of all the involved 
marker pair abstractions: the first three marker pair abstractions describe the 
location of A, g and Y while the two last marker pair abstractions describe the 
linkage of the syntactic ambient names m in A and o in Yl we then take into 
account synchronization conditions between the components of these abstract 
tuples: the third and the seventh (resp. the fifth and the ninth) components 
shall be equal since they both denote the thread marker associated to g (resp. 
Y)', the synchronization between the second and the fourth components denotes 
that A and g must be located in the same instance of the ambient labeled l\; the 
one between the first and the sixth denotes that Y must be located in the good 
instance of the ambient A and the one between the eight and the tenth enforces 
the equality between the ambient name of g and the name the capability of Y 
works on. We then extract from A(l\,nm) an approximation of the interactions 
which may be created by performing the in migration rule on this redex: the 
instance of A can move inside the one of g, keeping the same history marker 




426 



Jerome Feret 



(given by the first component of the abstract tuples), but its location marker is 
then the history marker of the instance of /r (given by the third component). We 
are left to spawn the continuation of if}, all its markers remain unchanged. The 
other abstract rules follow the same schema. 



6 Abstract Domain 

Various domains can be used to instantiate the family of parametric domains 
(/d„)„gN, depending on the expected complexity and accuracy. We propose three 
particular instantiations. The first one abstracts away the information about 
markers. The result is an uniform control flow analysis. The second one only 
keeps the equality relationships among markers, and gives an analysis which 
presents strong connections with group creation The third one allows for the 
algebraic comparison of markers which is, to the best of our knowledge, out of 
the range of analyses presented in literature. 

6.1 Uniform Control Flow Analysis 

An uniform analysis can be obtained by instantiating all the elements of the 
family (/d*j)„gN with the lattice ({T,T},C). {T,T} is related to p(/d”) by the 
following concretization function defined by 7 n(-L) = 0 and 7 n(T) = /d". 
The abstract primitives are then defined as follows: 

- £« = T; 

-VAgp({T;T}), U„(^) = || 

- Va,6G{T,T}, (a.»6)= || 

- Va G {T,T}, assert{A,a) = a, 

The resulting analysis is always at least as precise as m but takes into 
account unreachable code. 



ifT^A 

otherwise; 

if a = 1- or 6 = T 

if a = T and b = T; 

nx{a) = a and push(^^ j^{a) = a. 



6.2 Confinement 

We now focus on the equality relationships between markers. This allows us to 
analyze whether an ambient name can only be communicated to the recursive 
instance which has created it, and whether a thread is always surrounded by an 
ambient activated by the recursive instance which have spawned it. 

We define /c^ as the lifted set Qn U Tn of all the non-oriented graph having 
vertices in [|1; n|]. The transitive closure of a graph {Q, r\) is denoted by {Q, r\*). 
The pre-order, the concretization function and the abstract primitives are defined 
on Qn as follows, and they can be easily lifted to Qn U T„: 

- V p([|l;n|]2), ([|1; n|], r>i) ([|1; n|], £^ 2 ) 

- V([|l;n|],r^) G 7 „(([|l;n|],r^)) = {(«(i)jg[|i.„|] I A: ^ idk = idi}; 
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-£“ = ({!}, 0 ); 

- VA e p{Gn), U„(A) = ([|l;n|];r^u) 

where i j ’^l]i G A, i j; 

- Va=([|l;n|],r>a) G 6= ([|1; m|], G Gm, {a b) = ([|1; n + m|], 

^ \ir\aj ifij e [|1;?^|] 

|(z-n) r\t {j -n) ifi,j G [\n + I] n + m\]; 

- V([|l;n|],^)G0„, AGp([|l;n|]2), assert{A, ([| 1 ; n|], r>)) = ([|l;n|], AU r\); 

- Va = ([|l;n|],r>a) G Gn, 



where i j 



where i r\n j i,j G [|l;p|], Si sj; 
- Va G Gs, pus\ij){a) = ([|1;2|],0). 



As in PP this analysis can only prove that an ambient name is confined inside 
the scope of the recursive instance which has declared it. It is unable to prove that 
a name which first exits this scope can then only be sent back to the recursive 
instance which had created it. 



6.3 Non-uniform Analysis with Algebraic Comparisons 

We now abstract algebraic comparisons between markers. Following Prop. Q, we 
only abstract the right comb of each tree. We then use the reduced-product of 
two abstractions. Our first abstraction consists in abstracting component-wise 
the shape of the history markers associated to threads, their locations, and their 
ambient names. We use a regular description of sets of trees: we introduce 
as the set of all the n-uples of regular automata over the alphabet Lhl^ . is 

related to p^Id'^) by the following concretization function: 



'Tn((A2)j^[|X;n|]) {(^^i)^e[|l;n|] I Vt G [|l,n|], G /l(Aj)j- 



— assert({(a, 6)} U A, Q)i = 



£** is an automaton which only recognizes the word e; 

Lira applies component-wise the classical finite union of regular automata; 

• (resp. n) is the classical concatenation (resp. projection) of tuples; 

{assert{A, Q))i if j ^ {a; b} 

{assert{A, Q))a H {assert{A, Q))h otherwise 
since there can be infinite increasing sequences of regular languages, we need 
a widening operator V. It is sufficient to construct a widening operator 
for regular automata and to apply it component- wise. Given 6 G N*, a 
convenient choice for A 1 VA 2 consists in quotienting the set of the states of 
the automaton Ai U A 2 by the relation that identifies the states of an 
automaton which have the same <5-depth residue^. The higher 6 is, the more 
accurate and expensive the analysis is. 



Our second abstraction captures non-uniform comparisons between the num- 
ber of occurrences of each pattern inside sets of marker pairs. For each n G N, 

® The ^-depth residue set of a state g in a labeled transition system is 

defined as {u G S* | |m| i5 and 3q' £ Q q ^ q'}. 
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we introduce the set V„ of distinct variables {a;^ | i G [|l;n|], A G Lbl^}. The 
abstract domain p(N^") is related to p(/d”) by the monotonic map 

= {(^^i)ze[|l;n|] I ^ ^ ~ 101(2^2)1^}- 

p{Id^) is then related to the complete lattice of the affine equality systems on 
the set of variables V„, denoted by Id^^. This domain is described with its lattice 
operations in P). We describe the remaining abstract primitives as follows: 

— e'^ is given by the system {a;^ = 0, VA G Lbl^; 

— given K G and K' G Id^\ we obtain the abstract concatenation of K 
and K' , by renaming each variable to in K' , and gathering all the 
constrains of the two systems; 

— assert{{i\ = ji; ip = jp}, K) corresponds to inserting all the constrains of 
the form = Xj^, \/k G [|l;p|], A G Lbl^ in K; 

— ^ corresponds to collect all the constrains involving just the vari- 
ables and then renaming each variable x}'^ into the variable a;^; 

— push(^j^ j^{K) is obtained by replacing in each constrains each occurrence of 

the variable by the expression — 1 and then applying the abstract 
projection fl( 2 , 3 )- 



Example 3. We run the third analysis on the system of Example El we denote 
the result by (/, g). We succeed in proving that an ambient name created by the 
binder {v q) can only be communicated either to the agent in a “request” 
ambient surrounded by an p ambient, or to the agent (rep)^ in a “answer” am- 
bient surrounded by an p ambient. In the second case, we also capture these 
properties: 

f72(5((rep)9,rep,g)) = {(3.19).(ll,20)".(ll,21),(ll,20)"(ll,21)} 
l72(/((rep)9,8)) = {(3.19).(ll,20)"(ll,21),(3.19).(ll,20)"(ll,21)} 

] 72 (/(answerS[.], 12)) = {(3.19). (11, 20)".(11, 21), (11, 20)".(11, 21)} 

[l2ig{p^^[>],P,p) = {( 11 , 20 )".( 11 , 21 ),( 11 , 20 )".( 11 , 21 )}, 

this proves that the ambient name communicated inside the “query” ambient 
and the name of the packet which surrounds this “query” ambient have been 
both declared by the same recursive instance of the client resource. □ 



Remark 2. Our confinement analysis is not an abstraction of our non-uniform 
analysis, because two distinct markers may be recognized by the same automa- 
ton while containing the same occurrence number of each pattern (i.e having the 
same Parikh vector ESI). The equality of the Parikh vector implies the equality 
of markers if they are recognized by an automaton only composed of an acyclic 
path between an initial and a final state and without embedded cycle, and such 
that the set of the Parikh vectors of the cycles of this automaton are linearly 
independent. Nevertheless, we may use the reduced product of both our confine- 
ment analysis and our non-uniform control flow analysis to solve this problem. 
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6.4 About the Complexity of Our Analyses 

We shortly describe the time complexity of our analyses. In the following table, 
the first line denotes the redex detection and information propagation, the second 
line denotes the cost of performing an abstract operation and the third one 
denotes the maximum iteration number. 





0-CFA 


confinement 


u-CFA 


scan 


t.n^.ip 


t.n^.ip 


t.n‘‘.ip 


domain complexity 


1 


1 


fT^ 


height of the abstract iteration 


i 


i 


i.a^ 


time-complexity 


i.t.ri'^ .ip 


i.t.n'^.ip 


i.t.n'^ .ip 



where N is the system length; t is the number of the distinct transition labels 
which occur during the analysis: t is cubic in the worst case, but is only quasi- 
linear in practice; n is the sum of the number of name binders and the number 
of ambient activators: it is linear in A; i is the number of interactions between 
the agents of the system: in practice i is quasi-quadratic in A, but is cubic in the 
worst case; ip is a bound to the number of the interactions with Q, for any fixed 
process Q: ip is quadratic in worst case in A, but is quasi-linear in practice; a is 
the number of pattern occurring in markers: it is either linear or quadratic in A, 
depending on the choice for the history markers (Cf. Prop. Ql; <5 G N* is chosen 
as a parameter of our abstraction. 

Both effective transitions and effective interactions are detected during our 
iteration. This allows us to speed up the analysis, the cost of which only depends 
on the number of both effective transition kinds and effective interactions. 

7 Conclusion and Perspectives 

We have described a parametric framework for automatically inferring a descrip- 
tion of the interferences between recursive instances of the agents of a mobile 
ambient in a polynomial time. Our framework also applies when extending the 
model with mobility control or higher order communication. As in the tt- 
calculus jOj , we would like to extend this framework to analyze the behaviour of 
an open system executed in a hostile context. 

This framework is highly extensible and is very likely to be enriched by an 
occurrence counting analysis. Analyses in literature nmsi are not polynomial, 
but we are working on a polynomial one inspired by m- Our long-range forecast 
is to use the low-level properties we compute to synthesize high-level properties 
which may be expressed in a modal logic as suggested in |3I . 
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Abstract. In the past decade, explosive growth in computer networks 
has brought security issues to the forefront. One of the greatest challenges 
in computer security today is the software assurance problem: How do 
we deal with the fact that our most trusted software, even our security 
software itself, is often buggy? 

In this talk, I will discuss how static analysis can help with the software 
assurance problem. I will describe some recent experience with static 
analysis tools for vulnerability detection. I will also survey a number of 
open problems in the field and suggest a few promising directions for 
future research. 
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Abstract. We propose a new method to check authenticity properties of 
cryptographic protocols. First, code up the protocol in the spi-calculus of 
Abadi and Gordon. Second, specify authenticity properties by annotating 
the code with correspondence assertions in the style of Woo and Lam. 
Third, figure out types for the keys, nonces, and messages of the protocol. 
Fourth, check that the spi-calculus code is well- typed according to a novel 
type and effect system. Our main theorem guarantees that any well-typed 
protocol is robustly safe, that is, its correspondence assertions are true 
in the presence of any opponent expressible in spi. It is feasible to apply 
this method by hand to several well-known cryptographic protocols. It 
requires little human effort per protocol, puts no bound on the size of 
the opponent, and requires no state space enumeration. Moreover, the 
types for protocol data provide some intuitive explanation of how the 
protocol works. Our method has led us to the independent rediscovery 
of flaws in existing protocols and to the design of improved protocols. 
My talk will describe our method and give some simple examples. Papers 
describing the method in detail appear elsewhere m- 
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Abstract. Most current cryptographic protocol verifiers meet the state 
space explosion problem, and have to limit the number of executions 
of the considered protocol during the verification. To solve these prob- 
lems, we introduce an abstract representation of cryptographic protocols, 
based on Prolog rules, and use it to verify secrecy properties of protocols. 



1 Introduction 

This short summary is a reference for the author’s invited talk for SAS’Ol. It 
presents a new technique for verifying cryptographic protocols, that is described 
in detail in [S|. The design of cryptographic protocols is difficult and error-prone. 
This can be illustrated by attacks found in existing protocols PI 01 El 1^ • It is 
therefore important to have automatic tools to verify these protocols. Many 
different techniques have been used in this area: model-checking , rewrit- 
ing jO], theorem proving |IS|, typing 0, abstract interpretation (more 

references can be found in E])- One of the most widely used techniques is model- 
checking, but it meets two important problems: 

— A first problem is the state space explosion: even for small specifications of 
protocols, the number of states to explore during verification is very large. 
Then, the verification of complex protocols becomes impossible. 

— Most model-checking tools limit the number of the runs of the protocol, 
to ensure the termination of the verification process. Indeed, at each run, 
the protocol creates new values. For an unbounded number of runs, an un- 
bounded number of new values would be needed, and the number of states 
would be infinite. However, limiting the number of runs of the protocol has 
serious consequences: an attack that appears with more runs will remain 
undiscovered. This is a problem for the certification of protocols, and man- 
ual proofs have to be done to show that the result obtained for a small 
number of runs extends to an unbounded number. 

These problems have already been tackled by previous works: m avoids the 
state space explosion by using the Strand Space Model, but still sometimes 
limits the number of runs of the protocol to guarantee termination. [3 1 1 tij do 
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not limit the number of runs of the protocol, by recycling the new values created 
at each run. However, they limit the number of parallel runs of protocols. 

We introduce an abstract representation of cryptographic protocols, based on 
Horn clauses (the basis of Prolog), that solves these problems by performing well- 
chosen approximations that preserve attacks. This representation yields a more 
efficient analysis than [H, by representing rules that generate the knowledge of 
the attacker, instead of representing this knowledge by tree automata. It is also 
more precise than since our analysis is relational, and it distinguishes values 
created at the same point of the protocol in different sessions. 

2 Representation of Cryptographic Protocols 

We assume that the protocol is executed in the presence of an attacker that can 
listen to messages, compute, and send messages. 

In our representation, messages are represented by terms. For instance, the 
term pencrypt(co, pk(sfc)) represents the encryption of cq under the public key 
pk(sfc), corresponding to the secret key sk. The facts are of the form attacker(M) 
meaning that the attacker may have the term M . Our abstract representation 
of the protocol consists of inference rules of the knowledge of the attacker. For 
example, the rule: 

attacker(pencrypt(m, pk(sfc))) A attacker(sfc) ^ attacker(m) 

means that the attacker can decrypt a message when it has the secret key: if the 
attacker has the encrypted message pencrypt(m, pk(sfc)) and the secret key sk, 
it can obtain the cleartext m. 

We also represent the actions of the agents involved in the protocol as infer- 
ence rules of the knowledge of the attacker. Indeed, if the agent A has received 
the messages Mi, . . . , Mn, and sends the message M, we consider that the at- 
tacker has relayed the messages Mi, . . . , M„ and intercepts H’s reply M. This 
yields the rule: 



attacker(Mi) A ... A attacker(M„) ^ attacker(M). 

If the attacker has the messages Mi, . . . , M„, it can send them to A, simulating 
the beginning of a protocol run, and obtain the reply M. 

Using these principles, we can abstract a cryptographic protocol by a set of 
Horn clauses. This representation is obviously an abstraction of the multi-set 
rewriting representation the number of times a message appears is ignored, 
to remember only that it has appeared. It is also an abstraction of the linear 
logic representation of cni. Moreover, we have built an automatic translator 
from the applied pi calculus [3| (restricted to certain equational theories) to our 
representation. Our representation is more abstract than most usual ones: 

— The messages are not organized into separate runs. For instance, the ith 
step of the protocol can be repeated several times when the previous steps 



Abstracting Cryptographic Protocols by Prolog Rules 435 



have been executed only once. More generally, a step of the protocol can 
be repeated any number of times as soon as the previous steps have been 
executed at least once. 

— Moreover, the freshness of nonces is modeled by considering fresh values as 
functions of the messages previously received by the creator of the value. 
Therefore, the fresh values are considered as different if and only if the pre- 
vious messages are different. The same nonces are reused in several sessions 
of the protocol, when the previous messages are the same. Therefore, we can 
have a finite space even when considering an infinite number of sessions. 

These approximations are keys to an efficient verification of the protocols, and 
enable us not to limit the number of executions of the considered protocol. 

3 Verifying Secrecy Properties 

We use our representation to verify secrecy properties of protocols: if the fact 
attacker(M) cannot be derived from the rules representing the protocol, then the 
adversary cannot have M , and the term M remains secret. This is exactly the 
kind of information that usual Prolog systems compute. However, they do not 
terminate on the rules representing a protocol, so we have designed, proved cor- 
rect, and implemented an efficient algorithm to handle this particular situation. 
This algorithm is fully detailed in 

The experimental results show that many examples of protocols of the litera- 
ture, including Skeme nn, can be analyzed by our tool with very small resources: 
from less than 0.1 s for simple protocols to 23 s for the main mode of Skeme, 
and less than 2 Mb of memory. 
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Abstract. Recently there has been considerable interest in program- 
ming languages that encode security policies in type declarations. Type- 
checking is used to determine whether a program enforces these policies. 
This approach enjoys many of the benefits of static type-checking, but 
is particularly of interest because it can enforce information flow proper- 
ties such as noninterference, for which purely dynamic mechanisms are 
ineffective. 

Enforcing information flow properties for distributed systems adds a new 
challenges: mutual distrust among the principals, and untrusted hosts. 
Our new approach, secure program partitioning, automatically rewrites a 
program into communicating subprograms that run securely on the set of 
available hosts yet collectively implement the original program. This fine- 
grained rewriting is based on the security types in the original program 
and the trust relationships among principals and hosts in the system. 
Computation in the original program is written in a single-host style, 
yet the resulting distributed system can satisfy the strong confidentiality 
and integrity properties specified by the program. 
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